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Abstract 



This thesis discusses the young fields of quantum pseudo-randomness and quantum 
learning algorithms. We present techniques for derandomising algorithms to decrease 
randomness resource requirements and improve efficiency. One key object in doing 
this is a fc-design, which is a distribution on the unitary group whose k^^ moments 
match those of the unitarily invariant Haar measure. We show that for a natural model 
of a random quantum circuit, the distribution of random circuits quickly converges 
to a 2-design. We then present an efficient unitary fc-design construction for any 
k, provided the number of qubits n satisfies k = 0(n/ log n). In doing this, we 
provide an efficient construction of a quantum tensor product expander, which is a 
generalisation of a quantum expander which in turn generalises classical expanders. 
We then discuss applications of fc-designs. We show that they can be used to improve 
the efficiency of many existing algorithms and protocols and also find new applications 
to derandomising large deviation bounds. In particular, we show that many large 
deviation bound results for Haar random unitaries carry over to A;-designs for k = 
poly(n). 

In the second part of the thesis, we present some learning and testing algorithms 
for the Clifford group. We find an optimal algorithm for identifying an unknown 
Clifford operation. We also give an algorithm to test if an unknown operation is close 
to a Clifford or far from every Clifford. 
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Chapter 1 



Introduction 



Landauer famously said that information is physical [Lan92] . A corollary of this is 
that computation is a physical process. It is simply the evolution of a physical state, 
governed by the laws of physics. A classical computer is therefore an information 
processor where the physical evolution is restricted to that of classical physics. A 
quantum computer is more general: quantum evolution is allowed. One might there- 
fore reasonably expect that quantum computers are more powerful. It might be that 
the extra possibilities allowed by quantum evolution allow states to be processed more 
efficiently to speed up the computation. Determining which problems a quantum com- 
puter can solve faster than a classical computer is the central problem in the theory 
of quantum computation. 

Significant progress has already been made in answering this question. Shor's 
algorithm |Sho94j shows that factoring of integers is possible in polynomial time on a 
quantum computer. In contrast, it is not known if factoring is possible in polynomial 
time on a classical computer. Also, Grover's unstructured search algorithm |Gro97j 
allows a marked item in an unsorted database to be found using only the square 
root of the time required on a classical computer. Finding other algorithms and 
provable separations between quantum and classical computation is an important 
area of current research. 

This thesis makes some progress towards finding such new algorithms. In classical 
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computer science, randomness and pseudo-randomness have been key tools in the 
development of new and faster algorithms. In the first part of this thesis, we discuss 
applications and constructions of quantum analogues of these pseudo-random objects. 
Whilst we do not come up with new algorithms based on these, we hope that in the 
future the tools we build will find application in this area. In the second part of this 
thesis, we discuss problems in the theory of machine learning, which is an area in 
which quantum computers could outperform their classical counterparts. 

A side theme in this thesis is the idea that computational complexity must be 
considered in physical models. The converse of our opening statement is also true: 
physical systems store and process information; they are computers. Therefore physi- 
cal systems that could solve problems that are provably difficult do not exist in nature. 
This can rule out models that provide too much computational power. 

In Part HI we discuss quantum pseudo-randomness. We introduce the subject 
in Chapter [2] and provide motivation from the classical computer science literature. 
The main idea is to use pseudo-randomness instead of full randomness to decrease 
the amount of randomness required. This is desirable in classical computing because 
random bits are expensive to produce. In quantum computing random bits can be 
obtained by measurement but uniformly random unitaries and states (formally defined 
in Section 12. ip cannot be produced efficiently so pseudo-randomness is necessary if 
efficiency is desired. In classical computing random bits are often saved by limiting 
dependence, for example by using fc-wise independent random variables. These are 
variables where the distribution of any k variables is the same as for fully independent 
random variables but dependencies become apparent when observing more than k of 
the variables. We discuss a quantum analogue of this known as a /c-design. 

In Chapter [3] we show that short random quantum circuits (see Section 11.11 for 
background on quantum circuits) are 2-designs, giving an efficient method for pro- 
ducing a 2-design. Then in Chapter U] we provide an efficient /c-design construction 
for all k, giving the first construction for k > 2. In order to do this, we present an 
efficient construction of a quantum A;-tensor product expander, which is a quantum 
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analogue of a classical tensor product expander which in turn is a generalisation of 
the standard expander used in classical computer science. We then summarise known 
applications of A:-designs in Chapter [5] as well as providing our own to show that k- 
designs exhibit measure concentration which in some cases is almost as strong as for 
uniformly random unitaries. 

In Part|lT]we turn to problems in learning theory. In particular, we consider the 
problem of identifying a given black box unitary with as few queries as possible. We 
find an algorithm with optimal asymptotic query complexity to identify an unknown 
unitary from the Clifford group (defined in Chapter [6]) . We also show how this can 
be done if the unitary only approximately implements a Clifford and we also present 
a testing algorithm to determine if a given operation is close to a Clifford or far from 
every Clifford. 

1.1 Brief Introduction to Quantum Mechanics 

We now briefly mention some key concepts in quantum mechanics and define some 
notation. For a more complete introduction see [NCOOj . 

The state of a d-dimensional quantum system is represented by a vector in the 
complex space C^. If d = 2, we call the system a qubit and often we will take d = 2" 
and say the system has n qubits. 

We will normally use Dirac notation for quantum states. We write column vectors 
as {ip), with the associated conjugate row vector as (V'l- The inner product between 
two states is written as ('i/'i|'02)- We will write ^^J for the projector States can 

also be probabilistic mixtures of pure states. If the state is with probability pi 
then it has density matrix 

It is often convenient to break the space up into different components, for example 
the system and its environment. Mathematically, systems are combined by using the 
tensor product. The combined state of system A in state IV'a) and system B in state 
\tpB) is written \i1^a) "X" \iPb)- This leads to the phenomenon of entanglement, which is 
when the combined state cannot be written in this product form. For example, the 
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state (|0a) <X) |0b) + \1a) <X) cannot be written in the product form |V'a)<X)|'0b) 
and so is entangled. To find the state of a subsystem A from a density matrix pab, we 
take the partial trace. Write pab = Ylijki Pijkl\'^A){jA\ \kB){lB\- Then the reduced 

state is pA = tlB PAB = J2ijk Pijkk\iA){jA\- 

Measurement of a quantum system can be written mathematically in terms of 
a POVM (positive operator valued measure). This is a set of positive semi-definite 
operators Pi such that Yli Pi = I- Then the measurement outcomes are the labels i 
and outcome i occurs with probability tr Pjp if the state being measured is p. 

The evolution of a closed quantum system is unitary. That is, the quantum state at 
a later time t is related by a unitary to the initial quantum state: pt = UtPoU}. If the 
system of interest is part of some larger system then the dynamics need not be unitary. 
The most general form of evolution can be written in the Kraus decomposition: pt = 

AiPqAI where Ai are any operators normalised so that ^l^i = I. 

Finally we mention that it is often convenient to think of unitary evolution as 
a quantum circuit, built up of smaller elementary unitary gates. This is in direct 
analogy to the use of circuits in classical computing. Classically, a NAND gate suffices 
to produce any other gate so all classical circuits can be made up of just NAND gates. 
Similarly, there exist sets of unitary gates from which any unitary can be built. An 
example is the following three gates: 



H 



V2 



1 -1, 



R 



7r/4 




CNOT 



A o^ 

10 
1 

\o 1 oy 

We often seek to build a family of circuits that act on n qubits for all n, where the 
gates are chosen from some elementary set, such as that above. We say that the 
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circuits are efficient if tfie number of gates grows only polynomially with n. 

1.2 Preliminaries 

Here we define some notation and concepts that are used throughout this thesis. 

1.2.1 Pauli Matrices 

We will often use the Pauli matrices: 




(1.2.1) 



We can extend these to matrices on n qubits by taking tensor products. Let p G 
{0, 1, 2, 3}" and ap = (g) ap^ (8) • • • <8) (Tp„ where pi is the value at the z*'* position in 
the string p. We will sometimes use the alternative notation of p G {/, x, y, z}". We 
will refer to ap as Pauli matrices on n qubits. There are 4" Pauli matrices and they 
are orthogonal i.e. tidpaq = 2^5pq. Note also that = ao, the identity. Also, Pauli 
matrices either commute or anticommute. 

The Pauli matrices form an orthogonal basis for matrices in C^"^^". Therefore 
any such matrix A can be written in the form X^p7(p)o'p, with 'j{p) = ^tvapA. 
Sometimes we will choose a different normalisation for the Pauli coefficients 7(p) but 
will make this clear from the context. 



1.2.2 The Symmetric Group and Permutation Operators 

The symmetric group is the group of all permutations. Let 5jv be the symmetric 
group on A'' objects. Then for vr G 5jv define the corresponding permutation operator 

N 

5(7r):=5^|7^(^))(^| (L2.2) 

i=l 
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to be the matrix that permutes the basis states |1), . . . , \N) according to tt. 

On the other hand, if we have k A?^-dimensional systems then for tt e Sk define 
the subsystem permutation operator S{7t) by 

TV TV 

'S'(7r) := ^ • • • ^ |"7r-i(l)' • • • "'7r-i(fc))("l' • • • ' ^k\- (1-2.3) 
ni=l nfc=l 

Now we present two useful lemmas about subsystem permutation operators. 
Lemma 1.2.1. Let C be a cycle of length c in Sc- Then 

tr (C (Ai ® ^2 . . . Ac)) = tr (Ac(i)Aco2(i)Aco3(i) ...Ai). 

Proof. We have 

tr(C(Ai ^2 O . . . <8) Ac)) 

= ^ {iii2 . . . ic\C {Ai <Si A2 <Si . . . <Si Ac) \i1i2 ■ ■ ■ ic) 

11,12, ...jic 

= XI (^ll^C{l)Kc(l))(^2|^C(2)Kc(2)) . . . {ic\Ac{c)\ic{c)) 
ii,i2,...,ic 

= XI (^ll^C(l)Kc(l))(^C(l)I^C°2(i)Kc°2(i)) . . . (ic°=-i(i)l"^il^i) 

11,12, ■■■,ic 

since C°"^(l) = 1. Evaluate the sum using the resolution of the identity to get the 
result. □ 

A simple example of this Lemma is that 

tr(J^(A ^B)) = tTAB (1.2.4) 

where is the swap operator. 

We also work out the Pauli expansion of the swap operator. To stress that this 
result does not depend on the choice of orthogonal basis we prove it in full generality. 
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Lemma 1.2.2. The swap operator T on two d- dimensional systems can be written 
as 

p 

where {cTp} form a Hermitian orthogonal basis with trap = d. 
Proof. Expand in the basis and use Lemma ll.2.11 

tr ((cTp (g) (Tq) F) = tr Upaq 

d p = q 
otherwise. 

The given sum has the correct coefficients in the basis therefore ^ Yip crp®ap = T . □ 

1.2.3 Asymptotic Notation 

We will use the following standard asymptotic notation. 

Definition 1.2.3. /(n) = 0{g{n)) if there exists c,nQ > such thatO < f{n) < cg{n) 
for all n> uq. 

Definition 1.2 A. f{n) = i}{g{n)) if there exists c, no > such that f{n) > cg{n) > 
for all n > uq. 

Definition 1.2.5. f{n) = e{g{n)) if f{n) = 0{g{n)) and f{n) = Q.{g{n)). 
Definition 1.2.6. f{n) = o{g{n)) if liuin^oo f{n)/g{n) = 0. 

1.2.4 Norms and Superoperator Norms 
Norms 

We will make heavy use of Schatten p-norms: 
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Definition 1.2.7. For A a d x d matrix, the Schatten p-norm is given by 

where ai are the singular values of A. 

In particular, P||i = Yfi=i^i = trVIU, P||2 = = Vt^WA and 

oo = maxj (Tj. 

These norms satisfy the following simple relationships: 



2 < < \/d||A||2 (1.2.6) 

Halloo <||A||i<d|H|oo (1.2.7) 
<\\A\\2<Vd\\A\\^. (1.2.8) 



Superoperator Norms 

Just as state norms can be used to bound the distinguishability of states, superoper- 
ator norms bound how easy it is to tell different superoperators apart. We start with 
the 1-norm: 

Definition 1.2.8. The 1-norm of a superoperator £ is given by 

m\ -sup Mm 

x^o 11-^ 111 

The main problem with this definition is that the 1-norm is not stable under 
tensoring with the identity i.e. there exist channels with \\S (8) idci||i_>i > 
where id^ is the identity channel on d dimensions. This means that some channels 
are easier to distinguish by inputting entangled states. If the norm is to measure 
the distinguishability of channels it should take this into account. To overcome this 
problem, the diamond norm is defined: 
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Definition 1.2.9 ( |KS V02] ) . The diamond norm of a superoperator £ is given by 

\\£\\^ = sup\\£ (g) iad||^^;L = sup sup — — . 

d d Xj^O ll^lli 

If follows immediately that ||i5||i_s>i < lli^llo- Also it is shown in |KSV02j that the 
diamond norm satisfies \\£ ^dd\\o = W^Wo for all channels £ and dimensions d and 
that the dimension d in the supremum can be taken to be the same as the dimension 
of the system £ acts on. Operationally, the diamond norm of the difference between 
two quantum operations tells us the largest possible probability of distinguishing the 
two operations if we are allowed to have them act on part of an arbitrary, possibly 
entangled, state. 

We will also use the 2-norm: 



Definition 1.2.10. The 2-norm of a superoperator £ is given by 

\\S\\2^2 = sup 



II^^WIl2 



In |van02j Appendix C, the following relationships between the superoperator 
norms are proven: 

11^112^2 < 1 1^1 (1.2.9) 

ll^lli^i < Vd||^:||2^2 (1.2.10) 

ll^llo < dll^lli^i (1.2.11) 

ll^^llo < d||^||2^2. (1.2.12) 



1.3 Previous Publications 

The majority of this thesis has been published previously and some is work in collab- 
oration. 

Chapter [3] is joint work with Aram Harrow and is available as "Random Quantum 
Circuits are Approximate 2-designs", Communications in Mathematical Physics, Vol- 
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ume 291, Number 1, Pages 257-302. It is also available as a pre-print: arXiv:0802.1919. 

Chapter H] is also joint work with Aram Harrow and is available as "Efficient 
Quantum Tensor Product Expanders and A;-Designs" , Proceedings of RANDOM 2009, 
LNCS, Volume 5687, Pages 548-561. It is also available as a pre-print: arXiv:0811.2597. 

Chapter [S] from Section 15.21 onwards is available as "Large deviation bounds for 
A;-designs", Proceedings of the Royal Society A, Volume 465, Number 2111, Pages 
3289-3308. It is also available as a pre-print: arXiv:0903.5236. 

ChapterOis available as "Learning and Testing Algorithms for the Clifford Group" , 
Physical Review A, Volume 80, Number 5, Page 052314. It is also available as a pre- 
print: arXiv:0907.2833. 
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Part I 

Quantum Pseudo-randomness 
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Chapter 2 

Introduction to Quantum 
Pseudo-randomness 

Randomness is an important resource in both classical and quantum computing. It 
has applications in virtually all areas of computer science, including algorithms, cryp- 
tography and networking. Randomness can improve efficiency or, as in the case of 
cryptography, allow us to perform tasks that we would not be able to do with deter- 
ministic resources. 

An example of an algorithm where a randomised algorithm is faster than any 
known deterministic algorithm is polynomial identity testing. Here, the task is to 
determine if two polynomials are identically equal. By evaluating the polynomials on 
random inputs, identity testing can be done in polynomial time whereas no polynomial 
time deterministic algorithm is known. 

Also, a commonly used randomised algorithm is that of randomised quicksort. In 
quicksort, a pivot element is chosen and elements smaller than this are placed to the 
left and larger elements to the right. Then these two parts are sorted recursively. 
However, the choice of pivot element greatly affects the run-time of the algorithm. If 
chosen poorly (for example so that there is only one element smaller than the pivot), 
the algorithm runs in O(n^) time. If chosen well, the algorithm runs in 0{n log n) time. 
Choosing the pivot element randomly will be a good choice on average, giving expected 
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run-time 0(n log n) [MR95j . However, this run-time can be achieved deterministically 
using deterministic median finding |BFP"'"72 but in practice the randomised method 
is more efficient. 

As another example, many primality testing algorithms are randomised because of 
their simplicity, even though a deterministic polynomial-time algorithm is now known. 
Also, in the field of communication complexity, separations between deterministic and 
randomised algorithms can be proven. The deterministic complexity of evaluating the 
equality function (to determine if Alice and Bob's strings are equal) is 0(n), whereas 
the randomised complexity is G(logn) [KN96j . As yet another example of randomness 
in classical computer science, in networking a random delay is often inserted after a 
collision so the nodes wait different times so are likely to avoid another collision. 

In this part, we seek to extend some of these gains of using randomness to quan- 
tum computing. We wish to find applications of randomness to find new quantum 
algorithms and constructions. 

Besides the computer science applications, there are also physical reasons for 
studying randomness in quantum mechanics. Some systems can be modelled as inter- 
acting randomly and it is interesting to ask what the limiting state (or distribution on 
states) is for such a system. Also of great interest is how quickly the system reaches 
this stationary state. If the time taken grows too quickly with the size of the system 
(for example, exponentially) then for any system apart from the most trivial, the sta- 
tionary state will never be reached and will not be seen in physical systems. However, 
if the time is small (for example, a small polynomial), then the stationary state can 
be reached quickly and will be observed in real systems. It is in problems like this 
that physicists must consider the computer science aspects of their models. We study 
problems of this kind in Chapters [3] and [5l 

2.1 Random Unitaries 

In quantum computing, operations are unitary gates and randomness is often used 
in the form of random unitary operations. Random unitaries have algorithmic uses 
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(e.g. |Sen05| ). cryptographic applications (e.g. |AS04^ [HLSW04j ) and applications to 
fundamental quantum protocols (e.g. [BHL"'"05l IHHLd4] ). For information-theoretic 
applications, it is often convenient to use unitary matrices drawn from the uniform 
distribution on the unitary group, also known as the Haar measure. This measure is 
the unique unitarily invariant measure i.e. the only measure dU on the unitary group 
U{d) where J^^^^) f{U)dU = /^^^^ f(UV)dU for all functions / and unitaries V . For 
random states, we write the unitarily invariant measure on d-dimensional states as 
dip. This can be thought of as a Haar distributed unitary applied to any fixed pure 
state. It is also known as the Fubini-Study metric. 

However, in both classical and quantum computing, obtaining random bits is often 
expensive, and so it is often desirable to minimise their use. For example, in classical 
computing, expanders (discussed in Chapter H]) and A:-wise independent functions (see 
Section 12. 2p have been developed for this purpose and have found wide application. 
We will spend a great deal of time exploring quantum analogues of these: quantum 
expanders and A;-designs. 

In addition to randomness being expensive, there is an even more pressing prob- 
lem when using random unitaries and states. An n-qubit unitary is defined by 4" real 
parameters, and so cannot even be approximated efficiently using a sub exponential 
amount of time or randomness. So any application that requires a random unitary 
cannot be efficient. Instead, we will seek to construct efficient pseudo-random en- 
sembles of unitaries which resemble the Haar measure for certain applications. For 
example, a /c-design (often referred to as a t-design, or a (A;, A;)-design), as mentioned 
above, is a distribution on unitaries which matches the first k moments of the Haar 
distribution. A;-designs have found many uses which are explored in Chapter O 

In Section [2.21 we formally define /c-designs and summarise known constructions. 
Then in Chapter [3] we show that, for a natural model of a random quantum circuit, the 
distribution quickly converges to that of a 2-design. This gives an efficient approximate 
2-design construction and also has physical applications. In Chapter [H we provide an 
efficient construction of a unitary A;-design for any k (although there are restrictions 
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on the dimension, see later). Then in Chapter [5l we discuss apphcations of designs, 
including to derandomising constructions that use large deviation bounds. 

Parts of this chapter have been published previously in |HL09bi IHLOQaj ILow09aj 
and parts are joint work with Aram Harrow. 

2.2 /c-designs 

A unitary fc-design is a distribution of unitaries that gives the same expectations 
of polynomials of degree at most k as the Haar measure. This is just like Gaus- 
sian quadrature, where integrals of polynomials are calculated by sums. Gaussian 
quadrature says that there exist sample points {xj} and weights {wi} so that for all 
polynomials / of degree at most 2b — 1, 



for some fixed limits p and q. This allows the integrals to be calculated much more 
efficiently. A unitary A;-design is the same, except the polynomial is on elements of 
unitary matrices from the unitary group rather than numbers on the real line. The k 
refers to the degree of the polynomial. We will also discuss state designs, where the 
function is on coefficients of states rather than unitaries. 

2.2.1 fc-wise Independence 

fc-designs can also be thought of as a quantum analogue of A;-wise independence. A 
sequence of random variables Xi , . . . , X„ is A;- wise independent if, for any subset of 
size j < k, 




(2.2.1) 



= X, 
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As a simple example of how this can save randomness, consider the set 

{000,011,101,110}. (2.2.3) 

If an element is chosen uniformly at random from this set, the probability distribution 
of the values of any two bits is the same as if all three bits were chosen independently. 
This is therefore a 2- wise independent set, and saves one bit of randomness. In general, 
if /c ^ n, an exponential saving in randomness can be made in this way. Efficient 
constructions of exactly fc-wise independent sets are known [ABI86j and more efficient 
approximate constructions are given in |NN90j . 

A related concept is that of A;-wise independent permutations. These are sets of 
permutations with the property that, when a permutation is chosen randomly from 
this set and applied to n points, the distribution of the positions of any k points is 
the same as if a uniformly random permutation was applied. For example, a random 
cyclic shift is a 1-wise independent permutation. Again, an exponential saving of 
randomness is possible [KNRODj . 

We seek to construct quantum /c-designs to achieve a similar saving of randomness 
for quantum algorithms. We now formally define /c-designs. 

2.2.2 Exact Designs 

We will use the following notation to distinguish the measure we are using. Write E 
for the expectation with Mjj^u meaning the expectation when U is chosen from the 
measure ly. If the measure is the Haar measure in dimension d we will write E(/^^(£;). 
We use the same subscripts for probabilities so ^Ur^U{d) denotes the probability when 
U is chosen from the Haar measure, etc.. When considering random states, we will 
write E|^)^5(rf), etc.. 
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State designs 

A /c-design is an ensemble of states such that, when one state is chosen from the 
ensemble and copied k times, it is indistinguishable from a uniformly random state. 
The state A;-design definition we use is due to Ambainis and Emerson [AE07j : 

Definition 2.2.1 ( |AE07j . Definition 1). An ensemble of quantum states v = {pi, \ipi)} 
is a state k- design if 



mm'"' =E|^).5(,) mi^ir' (2.2.4) 



We can evaluate the integral on the right hand side: 
Lemma 2.2.2. 



•^V- { k ) 

where Il+fe is the projector onto the symmetric subspace of k d-dimensional spaces. 



Proof. The standard proof (see e.g. [GW98j or BBD^97j) involves showing that 



ftp (|V^)(V'I)'^'^ '^V' commutes with all elements of an irreducible representation (irrep) 
of the unitary group that acts on the symmetric subspace so by Schur's lemma must 
be proportional to the projector onto the symmetric subspace. However, here we give 
an alternative proof that introduces a technique we will use later. 

By the unitary invariance of the Haar measure, {\^p) {ip\)^^ dtp commutes with 
jj^k £qj. g^Yl unitaries U. By Schur-Weyl duality (see e.g. |GW98j ) . this implies that 
the integral is a linear combination of subsystem permutation operators. Therefore 
we have 

/ m{^\f''di;= (2-2.6) 

However, the integral is invariant under permutations so must be the same for all 
permutations vr. Using H+k = ^ X^yreSfe '^(^) ^^'^ finding the normalisation by taking 
the trace (the dimension of the symmetric subspace is ('^"'"f"^)) proves the result. □ 

We will also state equivalent definitions of designs in terms of polynomials of 
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matrix elements of the unitary or coefficients of the state. First we must define what 
we mean by the degree of a polynomial: 

Definition 2.2.3. A monomial in elements of a matrix U or state 1-0) is of degree 
{ki,k2) if it contains ki conjugated elements and k2 unconjugated elements. We call 
it balanced if ki = /c2 and will simply say a balanced monomial has degree k if it is 
degree (k,k). A balanced polynomial is of degree k if it is a sum of balanced monomials 
of degree at most k, with at least one monomial with degree equal to k. 

So that, in this definition, UpqU*g is a balanced monomial of degree (1, 1) and 
Upqllrs is a monomial of degree (2, 0) and is unbalanced. For the state \ip) = J2i Q^iK)) 
OjO* is a balanced monomial of degree (1, 1). 

We can then define state fc-designs in terms of monomials: 

Definition 2.2.4 ( [AE07j . Definition 3). An ensemble of quantum states v is a state 
k-design if, for all balanced monomials M of degree at most k, 



(2.2.7) 



This is an equivalent definition to Definition 12.2.11 

Lemma 2.2.5 ( |AE07j . Theorem 5). The state design definitions \2.2.1\ and \2.2.4\ are 
equivalent. 

Proof. Firstly, we only need to prove the result for M of degree exactly k, since by 
partial tracing this implies the result for any smaller k. 



Each entry in the matrix E|^^^ 



is the expectation of a monomial of 



degree k, with the state chosen from the design. Further, the corresponding entry 



in E 



mm 



is the expectation of the same monomial but with the state 
chosen from the Haar measure. If the ensemble of states satisfies Definition 12.2.41 then 
these are equal, so the ensemble also satisfies Definition 12. 2. 1[ 

On the other hand, for every balanced monomial of degree k, there is an entry 



in E 



mm 



equal to its expectation. Therefore, if the ensemble of states 



satisfies Definition 12.2.11 then it also satisfies Definition I2.2.4[ 



□ 
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Unitary designs 

Consider having k d-dimensional systems in any initial state. A unitary fc-design is 
an ensemble of unitaries such that when a unitary is randomly selected from it and 
applied to each of the k systems, the overall state is indistinguishable from choosing 
a uniformly random unitary. This can be seen as a generalisation of state designs in 
that any column of a unitary fc-design is a state /c-design. Formally, we have: 

Definition 2.2.6. Let v he an ensemble of unitary operators. Define 

G,{p) = Ec/^, [u^'^'^piU^ f^] (2.2.8) 

and 

Guip) = ^u^uid) (2.2.9) 

Then the ensemble is a unitary k-design if Qy[p) = Gh{p) for all d^ x d^ matrices p 
(not necessarily physical states). 

For convenience we have defined this for all matrices p although it is equivalent 
to only require equality for physical states, since all matrices can be obtained from 
linear combinations of physical states. 

Like state designs, unitary designs can also be defined in terms of polynomials: 

Definition 2.2.7 ( |DCEL06| ). v is a unitary k-design if, for all balanced monomials 
M of degree k, 

Ku^,M{U) = Ku^u{d)M{U). (2.2.10) 

Again, these definitions are equivalent: 
Lemma 2.2.8. The unitary design definitions \ 2. 2. 6\ and\2.27}\ are equivalent. 



Proof. The proof is very similar to the state design case. Again, we only consider 
monomials of degree k since by partial tracing this implies the result for smaller k. 

Consider matrices p of the form \ii,i2, ■ ■ ■ ,ik){ji, ■ ■ , jk\ in Definition 12.2.61 
Then each element of U^'^p (f/^^)'^'^ is a balanced monomial of degree k and, for some 
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choice of indices in \ii,i2, ■ ■ ■ ,ik){ji,j2, ■ ■ ■ ,jk\, each balanced monomial of degree k 
appears. □ 

2.2.3 Approximate A;-designs 

While exact designs have desirable properties, it is often much easier to construct 
approximate designs which, for many applications, are sufficient. Also, approximate 
designs can have fewer unitaries than exact designs. For example, it was shown in 
[AMTdOOj that 2^" unitaries are necessary and sufficient for an exact unitary 1-design. 
However, an approximate 1-design can be implemented with only 2"+°^") unitaries 
which gives almost a factor of 2 saving in random bits. 

Approximate state designs 

Our approximate state design definition is as follows: 
Definition 2.2.9. u is an e- approximate state k-design if 



mi^ir' -E\^)^sw mmf <jk^- (2-2-ii) 

\ k ) 



C^k~^) s-PPsars because it is the dimension of the symmetric subspace. In |AE07j . 
a similar definition was proposed but with the additional requirement that the en- 
semble also forms a 1-design (exactly), i.e. 

^\i;)^u\lp) = IE|^)^5(rf)|V')(V'l 

This requirement was necessary there only so that a suitably normalised version of 
the ensemble would form a POVM. We will not use it. 

By taking the partial trace one can show that a A;-design is a /c'-design for k' < k. 
Thus approximate fc-designs are always at least approximate 1-designs. 
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Approximate unitary designs 



We have many choices to make when defining an approximate design. Here we give 
four definitions which are convenient in different contexts. In Lemma 12.2.141 we show 
that they are all equivalent, up to polynomial dimension factors. 

If the unitary design is considered a quantum channel that applies a random 
unitary from the distribution to the input, then a relevant measure is the diamond 
norm difference between the approximate design and an exact design. Because the 
diamond norm is related to the distinguishability of channels, having a low diamond 
norm distance means that it is difficult to detect that an approximate design was 
given rather than exact. One approximate design definition is therefore: 

Definition 2.2.10 (DIAMONlfl, See Chapter [3]). v is an e- approximate unitary k- 
design if 



(2.2.12) 



where Qy and Qh are defined in Definition \2.2.(A 



In [DCEL06] . they consider approximate twirling, which is implemented using an 
approximate 2-design. They give an alternative definition of closeness which is more 
convenient for this application: 



Definition 2.2.11 (TWIRL, |DCEL06] ). u is an e- approximate twirl if 



max 

A 



o - d2' 



(2.2.13) 



The maximisation is over channels A and d is the dimension. 

In Chapter [U unitary designs are constructed from quantum tensor product ex- 
panders. A quantum /c-TPE is defined as an ensemble v of unitaries such that 



U 



E 



Ur^U{d) 



U 



<^k,k 



< A 



(2.2.14) 



^We name the definitions to help distinguish them 
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for A < 1 and C/®*^''^ = C/®'^ (^jj*^<s>k ^^^^ motivation for this definition is explained 
in Chapter m. From this a natural /c-design definition follows: 

Definition 2.2.12 (TRACE, See ChapterH]). v is an e- approximate unitary k-design 
if 



E 



Ur^U{d) 



u 



^k,k 



< e. 



(2.2.15) 



In Theorem 14.1.31 we prove the simple result that a unitary design can be con- 
structed by iterating the TPE. 

We will also need a definition in terms of monomials: 

Definition 2.2.13 (MONOMIAL, See Chapter [5]). v is an e- approximate unitary 
k-design if, for all balanced monomials M of degree < k, 



\Eu^,M{U)-Eu^uid)M{U)\ < ^ 



(2.2.16) 



We would now like to show that all these definitions are equivalent. By equivalent, 
we mean that, if is an e-approximate unitary design by one definition, then it is an 
e'-approximate unitary design by any other definition, where e' = poly(fi'^)e. 



Lemma 2.2.14. DefinitionslMM (DIAMOND), \MM (TRACE) and\KKM (MONO- 
MIAL) are all equivalent. Also Definition \2. 2.1I\ (TWIRL) is equivalent to the other 
definitions for an approximate 2- design only. 

Proof. To prove this, we will consider yet another possible definition (OPERATOR- 
2-NORM): 



\Gu - Qh\ 



2^2 



< e. 



(2.2.17) 



Note that this is equivalent to 



EUr. 



< e 



(2.2.18) 



which is the same as Definition 12 . 2.12] (TRACE) except the norm is the oo-norm rather 
than the 1-norm. We shall prove that the other fc-design definitions are equivalent 
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to this. We then show that Definition 12.2.111 (TWIRL) is equivalent to Definition 
12.2.131 (MONOMIAL) for k = 2. We use notation A B to mean that if u is an 
e-approximate unitary /c-design according to definition A then it is a se-approximate 
unitary A:-design according to definition B. If s = 1 we omit the superscript. 

A diagram showing the different parts to the proof is given in Figure 12. li We 
remark that direction 2 is unneeded but is included since it provides tighter bounds 
and has a simple proof. 




Figure 2.1: A diagram showing the different parts of the proof of Lemma l2.2.14[ The 
dotted arrows show the correspondence is only for k = 2. The circled digits refer to 
the enumerated items below and the factors by the arrows indicate the precision lost 
in the approximation when converting between the definitions. 



1. 0PERAT0R-2-N0RM ^ TRACE: 

Use the equivalence between Equations 12.2.171 and 12.2.181 and that 



2. TRACE ^ 0PERAT0R-2-N0RM: 

Use 



< 
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and the equivalence between Equations 12.2.171 and [2?2.18l 

3. MONOMIAL OPERATOR- 2-NORM: 

Choose any p G C ^ and write it as p = /'jiK)OI- Then 

WGAp) - GHip)h < E \P^^\ ll^-dOOD - SH{\i){j\)h 

ij y kl 

using the fact that the 2-norm squared is the sum of the squares of the matrix 
elements. Now, we have a bound on the matrix elements of Qi,(\i){j\) — Qu(\i){j\) 
from Definition [MH (MONOMIAL): 

\iGA\^){j\)-gHi\^)mM\<-^ 

so 

WGuip) -gHip)\\2 <e^\ptj\ 

<d'e\\p\\2. 

4. 0PERAT0R-2-N0RM A DIAMOND: 

This follows from the superoperator norm relationship given in Eqn. 11.2.121 

5. DIAMOND 0PERAT0R-2-N0RM: 

This uses the operator norm inequalities ||0||i_s.i < \ \(j)\\o and Eqn. I1.2T91 

6. TRACE MONOMIAL: 

Let M be a balanced monomial of degree k and write it as 

M — U U J J* J J* 
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Then let M = \pi){qi\ (g) . . . (g) \pk){qk\ ^■■■^ \rk){sk\- Then M{U) 

tx MU®^'^ and ||M||oo = 1- Now we use the fact that for any operator A 



\A\\i = mayL{ivAB : ||5||oo < 1} 



(2.2.19) 



to rewrite the TRACE definition: 



max {tr (Eu^^P®"^'] - Eu^uid)[U'"'^']) B : \\B\\^ < l} 



> 



®k,k- 



M 



= \Ru..uM{U)-Eu^u(d)M{U)\. 

7. MONOMIAL A TWIRL (for k = 2): 
Write A(yo) in the Kraus decomposition as 



K{p) = Y,AkpAl 



(2.2.20) 



with 



Y,A\Ak = I. 



(2.2.211 



Let A.u{p) = J2k ^kU pU^ aI.U . Then the p,q matrix element of J^u{p) is 



E 

krstuij 



Pij UsiUuq Uj.p U^jAkrs^kut- 



(2.2.22) 



From Definition [2213 (MONOMIAL) we have that 



|(E — ^U^Uid))UsiUuqU*pU^j\ < e/(f 
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(treating expectation as an operator). This implies that 



< \pijAkrsAl^t\€/(f. 

krstuij 

(2.2.23) 

Now, X^r^l^fcrsl < (^ll^fclb and Ylij \Pij\ ^ ^llplb and, taking the trace of the 
normalisation condition Eqn. 12.2.211 we find 

d = J]]trA^^fc = ^||^fc||^. 
k k 

So we find Eqn. 12.2.231 is upper bounded by 

±d\\p\\,Y,d'\\A,\\l<ed'\\p\\2. 

k 

Using the fact that the 2-norm squared is the sum of the squares of the matrix 
elements we find that 

\\Ku^Mp) - ^u^u{d)^u{p)\\2 < ed^\\p\\2. 

Using II • ||o < c?|| • lb (Eqn. [L2.12p we prove the result. 

8. TWIRL ^ MONOMIAL for k = 2: 

Let Aa- = \p){q\+a\r){s\ where cr G { + 1, -1, +i, -z}. Let B = /- |g)(g| - |s)(s|. 
Then Aa- and B are the Kraus operators of a valid channel, provided p ^ r, which 
we assume for now. Further, let 

^uAp) = U^KiUpU^W (2.2.24) 

where A^- is the channel with Kraus operators A^ and B. Now let 

Ku,s{p) = A[/,+i(/o) - Aj/,_i(/9) + iKu,+iip) - i^u-i{p)- (2.2.25) 



PijMrsAl^^{¥.Ur.u - ^Ur^U{d))UsiUuqU*pUij 

krstuij 
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We see that 

AuAp) = 'iU^\p){q\UpU^\s){r\U. (2.2.26) 

Now, from Definition 12. 2. Ill (TWIRL) and the triangle inequality (using || • II2 < 
II • 111), we have 

WEu^.AuAp) -Eu^u{d)^uAp)\\2 < (2.2.27) 
This implies that each matrix element is small i.e. 



liEu^^ - Eu^uid)){c\U^\p){q\UpU^\s){r\U\d)\ < (2.2.28) 

Now let p = |e)(/|. We do not have to choose a physical state since the diamond- 
norm bound is true for all matrices. This gives us 

\{Eu^, - Eu^u{d))U;jJ,eU:fUrd\ < ^ (2.2.29) 

as required. 

For p = r, we also assume that s = q since if not, just take p r and s = q 
and swap the labels. Here take A± = ziz\p){q\ and B = I — \q){q\ and consider 
Au,+ {P) - Ac/,-(p) = 2U^p){q\UpU^q){p\U. □ 

We remark that other types of approximate definitions are possible. For crypto- 
graphic uses, a computationally secure approximate design may be sufficient, rather 
than the information theoretic security discussed above. A computationally secure 
approximate design would be nearly indistinguishable from an exact design in poly- 
nomial time. Applications and constructions of such objects remain open problems. 

Constructions 

Here we summarise the known constructions of unitary and state designs. We will 
say that a fc-design construction is efficient if the effort required to sample a state or 
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unitary from the design is polynomial in n and k. Note that we do not require the 
number of states or unitaries to be polynomial because, even for approximate designs, 
an exponential number is required. Rather, the number of random bits needed to 
specify an element of the design should be poly(n, k). 

We start with state design constructions since these have been studied far more 
than unitary designs. Firstly, exact efficient state 1-designs are trivial: simply choose 
a random state from any basis. Numerous examples of exact efficient state 2-design 
constructions are known (e.g. )Bar02j ). Hayashi et al. |HHH06j give an inefficient 
construction of state A;-designs for any n and k but general exact constructions are 
not efficient in n and k. However, Ambainis and Emerson provide an efficient approx- 
imate construction for any k with d > 2k. Aaronson [Aar09j also gives an efficient 
approximate construction. 

Less is known about efficient constructions for unitary designs. It is straightfor- 
ward to prove that the Pauli matrices form an exact 1-design and in |DLT02l IDanOSj 
it is shown that the Clifford group (see Chapter [6] for a definition) forms an exact 
2-design although no efficient exact sampling method is known. However, an approxi- 
mate sampling method is given in |DLT02j and a more efficient approximate 2-design 
construction is given in |DCEL06] . The structure of unitary 2-designs is considered 
in |GAE07j . providing lower bounds on the number of unitaries in the design. 

In Chapter m we give the first efficient approximate unitary /c-design construction 
for k > 2. The construction works in 0{kn + logl/e) time for k = 0{n/ \ogn). 
Through Lemma l2.2.14t the construction is efficient for all the equivalent definitions 
above. We also conjecture in Chapter [3] that random quantum circuits of length 
poly(n, k) are approximate unitary fc-designs although we only prove this for k = 2. 
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Chapter 3 



Random Quantum Circuits 

3.1 Introduction: Pseudo-random Quantum Circuits 

Random circuits are a natural object to consider when looking at the complexity of 
random operations. They are circuits where the gates and their positions are cho- 
sen randomly from some given distribution. If the gate set that the random circuit 
chooses from is universal then, as we show below, the random circuit will converge 
to the uniform Haar measure. The advantage of considering a random circuit rather 
than a random unitary on the whole system is it is naturally efficient to implement, 
for polynomial length circuits. Random circuits of some fixed length are also a new 
measure on the unitary group which, as we show later, reproduces some of the prop- 
erties of the Haar measure for polynomial length. As well as the computer science 
aspects, this has applications in physics since randomly interacting systems could be 
modelled as a random circuit. These systems will only reach their equilibrium if the 
random circuit converges quickly. Thus proving convergence of the random circuit 
shows that some physical systems will have some properties of Haar random systems 
after evolving for a short amount of time. 

We consider a general class of random circuits where a series of two-qubit gates are 
chosen from a universal gate set. We give a framework for analysing the k^^ moments 
of these circuits. Our conjecture, based on an analogous classical result |BH08j . is 



32 



that a random circuit on n qubits of length poly(n, k) is an approximate fc-design. 
While we do not prove this, we instead give a tight analysis of the k = 2 case. We find 
that in a broad class of natural random circuit models (described in Section [3.1.ip . a 
circuit of length 0(n(n + log 1/e)) yields an e-approximate 2-design. The approximate 
design definition used in this section is the diamond-norm definition given in Definition 
12.2.101 and, through Lemma 12.2.141 applies to the alternative definitions given above. 
Moreover, our results also apply to random stabiliser circuits, meaning that a random 
stabiliser circuit of length 0{n{n + log 1/e)) will be an e-approximate 2-design. This 
both simplifies the construction and tightens the efficiency of the approach of |DLT02j . 
which constructed e-approximate 2-designs in time 0{n^{v? + log 1/e)) using O(n^) 
elementary quantum gates. 

3.1.1 Random Circuits 



The random circuit we will use is the following. Choose a 2-qubit gate set that is 
universal on U{4) (or on the stabiliser subgroup of f/(4)). One example of this is the 
set of all one qubit gates together with the controlled-NOT gate. Another is simply 
the set of all of U{4). Then, at each step, choose a random pair of qubits and apply 
a gate from the universal set chosen uniformly at random. For the U{4) case, the 
distribution will be the Haar measure on C/(4). One such circuit is shown in Fig. 13.11 
for n = 4 qubits. This is based on the approach used in |ODP071 IDOPOTj but our 
analysis is both simpler and more general. 

H ^ 



-B- 



5- 



a- 



-A- 



a- 




a- 



Figure 3.1: An example of a random circuit. Different lines indicate a different gate 
is applied at each step. 



Since the universal set can generate the whole of U (2") in this way, such random 
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circuits can produce any unitary. Further, since this process converges to a unitar- 
ily invariant distribution and the Haar distribution is unique, the resulting unitary 
must be uniformly distributed amongst all unitaries [ELLOSj . Therefore this process 
will eventually converge to a Haar distributed unitary from C/(2"). This is proven 
rigorously in Lemma 13.4.71 However, since a Haar unitary cannot be produced in 
polynomial time, this process will not converge in polynomial time. We address this 
problem by considering only the lower-order moments of the distribution and showing 
these are nearly the same for random circuits as for Haar-distributed unitaries. This 
claim is formally described in Theorem 13. 3. 1[ 

This chapter is organised as follows. In Section 13.21 we explain how a random 
circuit could be used to construct a A:-design. We then summarise the results of this 
chapter in Section 13.31 In Section 13.41 we work out how the state evolves after a 
single step of the random circuit. We then extend this to multiple steps in Section 
13.51 and prove our general convergence results. A key simplification will be (following 
|UDP07j ) to map the evolution of the second moments of the quantum circuit onto a 
classical Markov chain. We then prove a tight convergence result for the case where 
the gates are chosen from C/(4) in Section 13.61 This section contains most of the 
technical content of the chapter. Using our bounds on mixing time we put together 
the proof that random circuits yield approximate unitary 2-designs in Section 13.71 
Section ISTSl concludes with some discussion of applications. 

The majority of this chapter, with the exception of Section 13.6.41 has been pub- 
lished previously as |HL09bj and is joint work with Aram Harrow. 
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3.2 Preliminaries 



3.2.1 Pauli expansion 

Much of the fohowing wih be done in the Pauh basis. In this chapter, we choose the 
normahsation so that p is written in the Pauh basis as 

P = 2-"/2^7(pK. (3.2.1) 

p 

With this normahsation, 

Y^^{pf=ivp' (3.2.2) 

p 

which is 1 for pure p. In general, 

p 

with equahty if and only if p is pure. Note also that tr p = 1 is equivalent to 7(0) = 
2-"/2. 

This notation is extended to states on nk qubits by treating 7 as a function of k 
strings from {0, 1, 2, 3}". Thus a state p on nk qubits is written as 

p = 2-"*-'/2 jo{pi,...,pk)(Tp,0 ...0ap^. (3.2.3) 

Pl,--;Pk 

3.2.2 Random Circuits as A;-designs 

If a random circuit is to be an approximate /c-design then Eqn. l2.2.12l must be satisfied 
where the unitaries in Qi, are the different possible random circuits. We can think of 
this as applying the random circuit not once but k times to k different systems. 

Suppose that applying t random gates yields the random circuit W. If W^^ acts 
on an n/c-qubit state p, then the resulting state is 

:= VF®'=p(VFt)®fc = 2-"'=/2 lo{pu...,Pk)Wap,W^(E)...(S)Wap^W^. (3.2.4) 

Pl,--;Pk 
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For this to be a A;-design, the expectation over ah choices of random circuit should 
match the expectation over Haar-distributed W G C/(2"). 

We are now ready to state our main results. Our results apply to a large class of 
gate sets which we define below: 

Definition 3.2.1. Let E = {pi,?7i} he a discrete ensemble of elements from U{d). 
Define an operator Gs by 

i 

where JJ®^'^ = U^^ (8) (U*)^^ . More generally, we can consider continuous distri- 
butions. If fi is a probability measure on U{d) then we can define by analogy 
as 

Gf,:= [ d/i(C/)?7®^''= (3.2.6) 

Ju(d) 

Then £ (or fi) is k-copy gapped if Gs (or G^) has only kl eigenvalues with absolute 
value equal to 1. 

For any discrete ensemble £ = {pi,Ui}, we can define a measure fi = J2iPi^Ui- 
Thus, it suffices to state our theorems in terms of fj, and G^. We also remark that 
the A:-copy gapped property is the same as the fe-tensor product expander property 
for any non-zero gap as defined in Chapter HI 

The condition on G^ in the above definition may seem somewhat strange. We will 
see in Section [3.41 that when d> k there is a /c!-dimensional subspace of (C'^)®^'^ that 
is acted upon trivially by any G^. Additionally, when /u is the Haar measure on U{d) 
then Gf^ is the projector onto this space. Thus, the A;-copy gapped condition implies 
that vectors orthogonal to this space are shrunk by G^. 

We will see that G^ is /c-copy gapped in a number of important cases. First, we 
give a definition of universality that can apply not only to discrete gates sets, but to 
arbitrary measures on U{4). 

Definition 3.2.2. Let fi be a distribution on U{4:). Suppose that for any open ball 
5 C C/(4) there exists a positive integer £ such that n*^{S) > 0. Then we say jj, is 
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universal [for U (4) /. 

Here n*^ is the ^-fold convolution of fi with itself; i.e. 

= j dui...u,dfj.{Ui)---dfi{Ue). 

When /i is a discrete distribution over a set {Ui}, Definition 13.2.21 is equivalent to the 
usual definition of universality for a finite set of unitary gates. 

Theorem 3.2.3. The following distributions on C/(4) are k-copy gapped: 

(i) Any universal gate set. Examples are U{4) itself, any entangling gate together 
with all single qubit gates, or the gate set considered in lODPOTf . 

(a) Any approximate (or exact) unitary k-design on 2 qubits, such as the uniform 
distribution over the 2- qubit Clifford group, which is an exact 2- design. 

Proof. 

(i) This is proven in Lemma 13.4.71 

(ii) This follows straight from Definition 12.2.61 □ 

3.3 Summary of Results 

Theorem 3.3.1. Let fi be a 2- copy gapped distribution and W be a random circuit on 
n qubits obtained by drawing t random unitaries according to jjL and applying each of 
them to a random pair of qubits. Then there exists C ( depending only on fi) such that 
for any e > and any t > C(n(n + log 1/e)), Gw is an e-approximate unitary 2-design 
according to either Definition\MdE (DIAMOND) or Definition\MIM (TWIRL). 

To prove Theorem 13. 3. 11 we show that the second moments of the random circuits 
converge quickly to those of a uniform Haar distributed unitary. For W a circuit as 
in Theorem 13.3.11 write 7iy(pi,P2) for the Pauli coefficients of 

Then write 74(^1,^2) = ^w7w{pi,P2) where is a circuit of length t. Then we have 
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Lemma 3.3.2. Let and W be as in Theorem \ 3. 3. 1[ Let the initial state be p with 
loip^p) ^ ^i^d Y2p^oiP^P) ~ 1 (f^''^ example the state <^ for ciny pure 

state l^j)). Then there exists a constant C (possibly depending on n) such that for 
any e > 



(i) 



,P2) '^PiP2 2n(^2" + 1) J 



< e 



(3.3.1) 



PlP2#00 



for t > Cn log 1/e. 



(ii) 



E 

Pl,P2 



7t(Pl,P2) - 5j 



P1P2 271(2'^ + 1) 



< e 



(3.3.2) 



for t > Cn{n + log 1/e) or, when p, is the uniform distribution on U{A) or its 
stabiliser subgroup, t > Cnlogj. 

We can then extend this to all states by a simple corollary: 



Corollary 3.3.3. Let p, W and be as in Lemma \3.3.2[ Then, for any initial 
state p = ^ Ylipi p2 lo{PiiP2)o'pi ^ CFp2! there exists a constant C (possibly depending 
on p) such that for any e > 



(^) 



P1.P2 

for t > Cn{n + log 1/e) . 



Ep^o7o(p,p) 



P1V2 



< e 



(3.3.3) 



(ii) 



E 

P1.P2 



lt{Pl,P2) - 



Ep^o7o(p,p) 



PlP2 



< e 



(3.3.4) 



for t > Cn{n + log 1/e) . 



By the diamond-norm definition of an approximate design (Definition I2.2.10p . 
we only need convergence in the 2- norm (Eqn. I3.3.3p . which is implied by 1-norm 
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convergence (Eqn. I3.3.4P but weaker. However, Definition 12.2.111 (TWIRL), which 
requires the map to be close to the twirling operation, requires 1-norm convergence 
(i.e. Eqn. [3331). Thus, Theorem [3XT] for Definition I2.2.1UI (DIAMOND) follows 
from Corollary 13.3.3( 1) and Theorem 13.3.11 for Definition 12.2.111 (TWIRL) follows 
from Corollary I3.3.3f ii) . Theorem 13.3.11 is proved in Section 13.71 and Corollary 13.3.31 
in Section 13.51 

We note that we do not need to separately prove the result for Definition 12.2.111 
(TWIRL) since the result follows from the equivalence of the /c-design definitions 
(Lemma l2.2.14p . However, we include the proof since, if our bounds were improved to 
show convergence in 0(n log ^) time, if we simply applied Lemma 12.2.141 this would 
only imply that 0(n(n + log 1/e) time was needed for Definition 12 . 2 . 1 11 (TWIRL). 

We also emphasise that, in the course of proving Lemma 13.3.21 we prove that the 
eigenvalue gap (defined in Section I3.5.3P of the Markov chain that gives the evolution 
of the 7(p,p) terms is 0(l/n). It is easy to show that this bound is tight for some 
gate sets. 

Related work: Here we compare our work with other related results and efficient 
constructions of approximate unitary 2-designs. 

• The uniform distribution over the Clifford group on n qubits is an exact 2-design 
|DLT02j . Moreover, |DLT02j described how to sample from the Clifford group 
using 0{rfi) classical gates and O(n^) quantum gates. Our results show that 
applying 0{n[n + log 1/e)) random two-qubit Clifford gates also achieve an e- 
approximate 2-design (although not necessarily a distribution that is within e 
of uniform on the Clifford group). 

• Dankert et al. |DCEL06] gave a specific circuit construction of an approximate 
2-design. To achieve small error in the sense of Definition 12 . 2 . lOl (DIAMOND), 
their circuits require the same 0{n{n + log 1/e)) gates that our random cir- 
cuits do. However, when we use Definition 12.2.111 (TWIRL), the circuits from 
|DCEL06] only need 0(n log 1/e) gates while we only show that random circuits 
of length 0(n(n + log 1/e)) suffice. 
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• The closest results to our own are in the papers by Oliveira et al. |ODP07|, 
IDUPOTj . which considered a specific gate set (random single qubit gates and a 
controlled-NOT) and proved that the second moments converge in time 0(n^(n+ 
logl/e)). Our strategy of analysing random quantum circuits in terms of clas- 
sical Markov chains is also adapted from i ODP07^ IDUP07j . In Section \'6A\ we 
generalise this approach to analyse the k^^ moments for arbitrary k. 

Our main results extend the results of |ODP07l IDOP07j to a larger class of gate 
sets and improve their convergence bounds. Some of these improvements have 
been conjectured by |Zni07| . where the author presented numerical evidence in 
support of them. 

• An algorithmic application of random circuits was given in |HH08j . where they 
were used to construct a new class of superpolynomial quantum speedups. In 
that paper, random circuits of length 0{n^) were used in order to guarantee 
that they were so-called "dispersing" circuits. Our results immediately imply 
that circuits of length O(n^) would instead suffice. We believe that this could 
be further improved with a specialised argument, since [HH08j assumed that 
the input to the random circuit was always a computational basis state. 



3.4 Analysis of the Moments 

In order to prove our results, we need to understand how the state evolves after each 
step of the random circuit. In this section we consider just one step and a fixed 
pair of qubits. Later on we will extend this to prove convergence results for multiple 
steps with random pairs of qubits drawn at every step. We consider first the Haar 
distribution over the full unitary group and then will discuss the more general case of 
any 2-copy gapped distribution. 

In this section, we work in general dimension d and with a general Hermitian 
orthogonal basis do, . . . ,cr^2_]^. Later we will take d to be either 4 or 2" and the 
(Tj to be Pauli matrices. However, in this section we keep the discussion general to 
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emphasise the potentially broader applications. 

Fix an orthonormal basis for d x d Hermitian matrices: fio, . . . , CTd^-i, normalised 
so that trcTpCTg = d6p^q. Let ao be the identity. We need to evaluate the quantity 

Ec/ {U^'ap, ... ® {U^f') =: T{p) (3.4.1) 

where the expectation is over Haar distributed U E U{d). We will need this quantity 
in two cases. Firstly, for d = 2", these are the moments obtained after applying a 
uniformly distributed unitary so we know what the random circuit must converge to. 
Secondly, for d = 4, this tells us how a random C/(4) gate acts on any chosen pair. 

Call the quantity in Eqn. 13.4.11 T(p) (we use bold to indicate a /c-tuple of coeffi- 
cients; take p = {pi, . . . ,Pk)) and write it in the ap basis as 

r(p) = ^ G(q; p)ag, 0...^ag^. (3.4.2) 
q 

Here, (^(q; p) is the coefficient in the Pauli expansion of T(p) and we define G as the 
matrix with entries equal to G(q;p). We have left off the usual normalisation factor 
because, as we shall see, with this normalisation G is a projector. Inverting this, we 
have 

^(q; p) = d"^ tr {aq, ® . . . ^^^^^(p)) 

= d-'^Eutr (^{aq, ^■■■0aq,)U^\ap,^---^ <^p,){U^f^) (3.4.3) 

Note that G is real since T and the basis are Hermitian. 

We can gain all the information we need about the Haar integral in Eqn. 13.4.11 
with the following observations: 

Lemma 3.4.1. T(p) commutes with U®^ for any unitary U . 

Proof. Follows from the invariance of the Haar measure on the unitary group. □ 
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Corollary 3.4.2. T(p) is a linear combination of permutations from the symmetric 
group Sk- 

Proof. This follows from Schur-Weyl duality (see e.g. |GW98j ). □ 

From this, we can prove that G is a projector and find its eigenvectors. 
Theorem 3.4.3. G is symmetric, i.e. G'(q;p) = G'(p;q). 

Proof. Follows from the invariance of the trace under cyclic permutations. □ 

Theorem 3.4.4. ^(Tr) is an eigenvector of G with eigenvalue 1 for any subsystem 
permutation operator S{7t) i.e. 

(^(p; q) tr((Tgi ® aq^S{-K)) = tr{ap^ (gi . . . (g) ap^S{Tr)). 

q 

Further, any vector orthogonal to this set has eigenvalue 0. 
Proof. For the first part, 

Y,G{p; q) tT{ag, . . . ^ ^q,S{7T)) 
q 

= d-''^EutT (^ag^Uap^U^^ . . . tr (ug^Uap^U^'^ tr {ug, ® . . . (Tg,5(7r)) 
q 

= tr S{tt)Eu ^ tr (^ag, U ap, ^7"^) cig, ® . . . 8) ^ tr (^ag^ U ap^ C/"^) tig, J 

(3.4.4) 

Writing U^apU in the ap basis, we find 

^ ^tr (agUapU^'^ Gg = UupUK 
1 

Therefore Eqn. 13.4.41 becomes 

tr {s{TT)¥.uU^ap^U ® U^ap^U^ = ii {ap^® ap^S{TT)) . 
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For the second part, consider any vector v which is orthogonal to the permutation 
operators (we can neglect the complex conjugate because S{7r) is real in this basis), 
i.e. 

tr {ag, (g) . . . (g) ^^.^(vr)) v{q) = (3.4.5) 

q 

for any permutation vr. Then 

G{p; q)w(q) = d''' ^ tr (a^, . . . ^ tTg,r(p)) v{q) 
q q 

which is zero since T(p) is a linear combination of permutations and v is orthogonal 
to this by Eqn. [3X5l □ 

Theorem 3.4.5. = G, i.e. Eq' ^(p; q')G(q'; q) = G(p;q). 

Proof. Using Eqn. [3X31 

J2 Gip; q')G(q'; q) = G{p; q')d-'' tr (a,, ® . . . ® '^.'Tici)) ■ 
q' q' 

From Corollary 13.4.21 T(q) is a linear combination of permutations. This implies, 
using Theorem 13.4.41 that 

Y G{p; q!)d-^ tr (^(T,/ «)...«) fTg/ T(q)) = d"'' tr {ap, ^ . . . (Jp^T{ci)) 
q' 

= G(p;q) 

as required. □ 

Corollary 3.4.6. G is a projector so has eigenvalues and 1. 

We now evaluate G and T for the cases of A; = 1 and k = 2 since these are the 
cases we are interested in for the remainder of the chapter. 
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3.4.1 k = l 

The k = 1 case is clear: the random unitary completely randomises the state. There- 
fore all terms in the expansion are set to zero apart from the identity i.e. 



Tip) = { 



ao p = 

(3.4.6) 

p^O. 



3.4.2 k = 2 



For k = 2, there are just two permutation operators, identity / and swap J-. Therefore 
there are just two eigenvectors with non-zero eigenvalue (n > 1). In normalised form, 
taking them to be orthogonal, their components are 

fl{Ql,Q2) = Sq^o5q20 

f2{qi,q2) = jTZrj^'i^i^^^ ~ 
We will now prove three properties of G that we need: 

1. G{pi,p2;qi,q2) = if pi 7^p2 or qi 7^ g2- 

Proof. Consider the function f{qi,q2) = 5qia^q2b with a^h. This function has 
zero overlap with the eigenvectors /i and /2 so it goes to zero when acted on by 
G. Therefore G{pi,p2', a, b) = 0. The claim follows from the symmetry property 
(Theorem [3X31) • □ 

With this we will write G{p; q) = G{pi,p2; qi,q2)- 

2. G{p;0) = 6po. 

Proof. Let G act on eigenvector /i. □ 

3. G{p; a) = ^^zr for a,p / 0. 
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Proof. Let G act on the input Sga- This has zero overlap with /i and overlap 



^ with /2. 



d2 



Therefore we have 



□ 



G{pi,P2;qi,q2) 



Pi^P2 or qi 7^ q2 

1 Pi = P2 = ^1 = 92 = 
Pi = P2 7^ 0, = 92 7^ 



(3.4.7) 



Since T{pi,p2) = Eqi,q2 G{pi,P2;qi,q2)(Tqi ® crq2> we have 



T{pi,P2) = < 



Pi 7^ P2 

(To <8) 0"o Pi = P2 = 



(3.4.8) 



Therefore the terms (g) with pi 7^ p2 are set to zero. Further, the sum of 
the diagonal coefficients jipjp) is conserved. This allows us to identify this with a 
probability distribution (after renormalising) and use Markov chain analysis. To see 
this, write again the starting state 



P=\^ 70{qi,q2)(Tq^ (g) CTgj 



with state after application of any unitary W 



PW = 2Y1 ^w{qi,q2)(Tq, ® <7g^ = 2-" Yl 7(91,92) (w<7g,W^^ (Waq,W^^ . 
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Then 



= tr (Tpw) 

'?l,<?2 

= ^ X] 7(^1, 92) tr (ag.o-qj) 

as required, where is the swap operator and we have used Lemmas 11.2.21 and 11.2.11 

3.4.3 Moments for General Universal Random Circuits 

We now consider universal distributions /u that in general may be different from the 
uniform (Haar) measure on U{d). Our main result in this section will be to show 
that a universal distribution on U (4) is also 2-copy gapped. In fact, we will phrase 
this result in slightly more general terms and show that a universal distribution on 
U{d) is also A;-copy gapped for any k. Universality (Definition 13. 2. 2( ) generalises in the 
obvious way to U{d), whereas when we say that /U is A;-copy gapped, we mean that 

\\G^-Guid)\\oo <1, (3.4.9) 

where G? = KijU^^'^, with the expectation taken over fi for G^ or over the Haar 
measure for Gfji^d) ■ 

The reason Eqn. 13.4.91 represents our condition for fi to be A:-copy gapped is as 
follows: Observe that G and G are unitarily related, so the definition of A;-copy gapped 
could equivalently be given in terms of G. We have shown above that Gij(^(i) (and thus 
Gu(d)) has all eigenvalues equal to or 1 i.e. it is a projector. By contrast, may 
not even be Hermitian. However, we will prove below that all eigenvectors of Gjj(^d) 
with eigenvalue 1 are also eigenvectors of with eigenvalue 1. Thus, Eqn. I3.4."9l will 
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imply that lim(__5.oo(G^)* = Gu(^a), just as we would expect for a gapped random walk. 

We would like to show that Eqn. 13.4.91 holds whenever is universal. This result 
was proved in }AK62j (and was probably known even earlier) when fi had the form 
{5ui + 5(72)/2. Here we show how to extend the argument to any universal 

Lemma 3.4.7. Let fi be a distribution on U{d). Then all eigenvectors of Gui^i£^ with 
eigenvalue 1 are eigenvectors ofG^ with eigenvalue 1. Additionally, if is universal 
then fj, is k-copy gapped for any positive integer k (cf. Eqn. 

In particular, if A; = 2 this Lemma implies that ^ is 2-copy gapped (cf. Theorem 
[3X311 . 

Proof. Let y = be the fundamental representation of U{d), where the action of 
U £ U{d) is simply U itself. Let V* be its dual representation, where U acts as 
U*. The operators and act on the space V'^'' (y*)®*=. We will see that 

Gu(d) is completely determined by the decomposition of V®^ ^(V*)®^ into irreducible 
representations (irreps). Suppose that the multiplicity of {r\,V\) in V®^ ® (y*'-^®^ 
m\, where the Va's are the irrep spaces and r\{U) the corresponding representation 
matrices. In other words 

y®^® (y*)®^ ^ 0y^®C"^ (3.4.10) 

A 

U®^ ® {U*)®^ ~ |A)(A| ® rx{U) In,, (3.4.11) 

A 

Here ~ indicates that the two sides are related by conjugation by a fixed {U indepen- 
dent) unitary. 

Let A = denote the trivial irrep: i.e. Vq = C and ro{U) = 1 for all U. We 
claim that MjjrxiU) = whenever A 7^ and the expectation is taken over the Haar 
measure. To show this, note that E,jjrx{U) commutes with rx{V) for all V G U{d) and 
thus, by Schur's Lemma, we must have E,urx{U) = cl for some c G C. However, by 
the translation- invariance of the Haar measure we have cl = KjjrxiU) = E,urx{UV) = 
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crx(y) for all V G U{d). Since A 7^ 0, we cannot have rx(y) = I for all V and so it 
must be that c = 0. 

Thus, if we write Gm^a) and using the basis on the RHS of Eqn. I3.4.1H we 
have 

Guid) = \0){0\^Imo (3.4.12) 



where |0)(0| is a projector onto the trivial irrep. On the other hand, 



G^ = |0)(0|®/„„ + ^|A)(A| 



(3.4.13) 



Thus, every eigenvector of with eigenvalue one is also fixed by G^. For the 

remainder of the space, the direct sum structure means that 



max 
A^o 



rxiU)dfiiU) 



(3.4.14) 



Note that this maximisation only includes A with dimVx > 1. This is because non- 
trivial one-dimensional irreps of U{d) have the form det C/™ for some non-zero integer 
m. Under the map U^e^^U, such irreps pick up a phase of e*™''^. However, C/®'^ 
(^U*)'^^ is invariant under U ^ e'''^U. Thus V^^ (g) (V*)'^^ cannot contain any non- 
trivial one-dimensional irreps. 

Now suppose by contradiction that there exists A 7^ with mx 7^ and 



rx{U)dii{U) 



1. 



(We do not need to consider the case || / rx{U)d^{U)\\oo > 1, since ||r;^(f/)||oo = 1 for 
all U and || • ||oo obeys the triangle inequality.) Indeed, the triangle inequality further 
implies that there exists a unit vector \v) £ Vx such that 



J dii{U)rx{U)\v) =uj\v), 



for some to £ C with Icjl = 1. 
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By the above argument we can assume that dim^A > 1. Since Vx is irreducible, 
it cannot contain a one-dimensional invariant subspace, implying that there exists 
Uo e U{d) such that 

\{v\rx{Uo)\v)\ = l-5, 

for some (5 > 0. Since U i->- |(v|rA(C/)|'u)| is continuous, there exists an open ball S 
around Uq such that \{v\rx{U)\v)\ <l-5/2 for all U eS. Define S := U{d)\S. 

Now we use the fact that is universal to find an £ such that IJ-*^{S) > 0. Next, 
observe that / dfi*^{U) {v\rx{U)\v) = u^. Taking the absolute value of both sides 
yields 



dii*'{U) {v\rx{U)\v) 

U{d) 



< / di,*'{U) \{v\rxmv)\ 

JU(d) 



= [ d^^'^iU) \{v\rxiU)\v)\+ fdf^^'iU) \{v\rxiU)\v)\ 
Js Js 

< fi*'{S) (^1 - + (l - ^^*^(S)) 



<1, 



a contradiction. We conclude that ||Gi7(d) ~ Gy^^Hoo < 1- □ 



3.5 Convergence 

In the previous section we saw that iterating any universal gate set on U (d) eventually 
converges to the uniform distribution on U{d). Since the set of all two-qubit unitaries 
is universal on U{2'^), this implies that random circuits eventually converge to the 
Haar measure. In this section, we turn to proving upper bounds on this convergence 
rate, focusing on the first two moments. 

Let G^'-^^ be the matrix with G (with d = 4) acting on qubits i and j and the 
identity on the others. Then, if the pair is chosen at step t, we can find the 
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expected coefficients at step t + 1 by multiplying by G^^^^ . In general, a random pair 
is chosen at each step. So 

q i^j 

where jt+i are the expected coefficients at step t. We can think of this evolution as 
repeated application of the matrix 

P = —^ — -VG^*^). (3.5.2) 
n(n - 1) ^ 

For k = 2, the key idea of Oliveira et al. |ODP07j was to map the evolution of 
the ^{p,p) coefficients to a Markov chain. The 7(^1,^2) coefficients with pi / p2 just 
decay as each qubit is chosen and can be analysed directly. 

However, we can only map the 'y{p,p) coefficients to a probability distribution 
when they are non-negative, which is not the case for general states. Most of the rest 
of the chapter is dedicated to proving Lemma r3.3.2l which only applies to states with 
"y{p,p) ^ and normalised so their sum is 1. Corollarv 13.3.31 then extends this to all 
states: 

Proof of Corollary \3.3.iA Lemma r3 . 3 . 2 1 still applies to the 7(^1 , P2) terms with pi 7^ P2- 
Therefore we just need to show how to apply Lemma [3.3.2l to states that initially have 
some negative "y{p,p) terms. 

For the "y{p,p) terms. Lemma 13.3.21 savs that the random walk starting with any 
initial probability distribution converges to uniform in some bounded time t. Let 
9tip,P',Q,Q) be the coefficients after t steps of the walk starting at a particular point 
q (i.e. goiP^P] q) = Sp,g)- Now, for any starting state p, let the initial coefficients 
be 7o(p,p)- Then, by linearity, we can write the expected coefficients after t steps 
lt{p,p) ■= E-fw{p,p) as 

it{p,p) = '^iG[q,q)9t{p,p;q,q) (3.5.3) 
9^0 
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for p ^ 0- 

We can now prove convergence rates for the expected coefficients jtiPyP)- 
(i) For the 2- norm, we have from Lemma 13.3.21 that for t > Cnlogl/e 

(9t{p,p;q,q) - An\, j (3-5-4) 



2 

1 



for any q. Note that the normahsation for the "i{p,p) terms with p 7^ has 
changed from Lemma 13.3.21 since we are neglecting the 7(0,0) term here. Now 

2^[lt{p.p) 

< Y.^o{q,q? {9t{p,p;q',q') - ^^rirj) 

<(A--l)eYloiq,qf 
<reY^oiqi,q2? 



2 



91:92 



re trp2 



< 4"e 



where the first inequality is the Cauchy-Schwarz inequality. Therefore for t > 
Cn{n + log4"/e), the 2-norm distance from stationarity for the j{p,p) terms is 
at most e. Choose C such that C'n{n + log 1/e) > Cn(n + log4"/e) to obtain 
the result. 

(ii) For the 1-norm, Lemma 13.3.21 savs that for t > Cn(n + log 1/e) 



E 



9tiq;p,p) 



4" - 1 



< e. (3.5.5) 
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We can then proceed much as for the 2-norm case: 



E 



lt{p,p) 



4n _ 1 



E 



( 9t{p,p\q, 



1 



<<^^\io{q,q)\ 

q^O 

< 2"e. 



gt{p,p;q,q) 



4" - 1 
1 



4" - 1 



Therefore for t > Cn{n + log2"/e), the 1-norm distance from stationarity for 



the j{p,p) terms is at most e. 



□ 



We now proceed to prove Lemma 13.3.21 Firstly, we will consider the simple case 
of A; = 1 to prove this process forms a 1-design as this will help us to understand the 
more complicated case of /c = 2. 



3.5.1 First Moments Convergence 

Recall that p = 2-"/^ Y,pl{p)(rp and we wish to evaluate the moments of the coeffi- 
cients. So for the first moments to converge, we want to know E,^{p). 

For k = 1, the ?7(4) random circuit uniformly randomises each pair that is chosen. 
More precisely, a pair of sites i,j are chosen at random and all the coefficients with 
Pi ^ or pj 7^ are set to zero. Thus we get an exact 1-design when all sites have 
been hit. For other gate sets, the terms do not decay to zero but decay by a factor 
depending on the gap of G. Call the gap A; for t/(4) A = 1 and for others < A < 1 
and A is independent of n. Therefore once each site has been hit m times the terms 
have decayed by a factor (1 — A)™. 
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For a bound like the mixing time (see Section 13.5.31 for definition), we want 
to bound the quantity '^p^Q\^wlwip)\ where 'jwip) is the Pauli coefficient af- 
ter applying the random circuit W. We also want 2-norm bounds, so we bound 
'}2py^o(^wlw{p))'^ too. We will in fact find bounds on 

^Ewhw{p)\ 

and 

Y,{^whw{p)\)\ 

p^O 

which are stronger. 

A standard problem in the theory of randomised algorithms is the coupon collector 
problem. If a magazine comes with a free coupon, which is chosen uniformly randomly 
from n different types, how many magazines should you buy to have a high probability 
of getting all n coupons? It is not hard to show that n In ^ samples (magazines) have 
at least a 1 — e probability of including all n coupons. Using this, we expect all sites 
to be hit with probability at least 1 — e after 0(nlog ^) steps. This argument can be 
made precise in this context by bounding the non-identity coefficients. We find, as 
expected, that the sum is small after 0(n log n) steps: 

Lemma 3.5.1. ^/ier 0(n log 1/e) steps 

Y,{^w\lw{p)\? <e 

and after 0(n log j) steps, 

^Ewhw{p)\<e. (3.5.6) 

Proof. At each step, a pair of sites is chosen at random and any terms with non- 
identity coefficients for this pair decay by a factor (1 — A). For example, the term 
CTi (8> ctq^^ decays whenever the first site is chosen. Thus the probability of each 
term decaying depends on the number of zeroes. We start with the 1-norm bound. 
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Suppose the circuit applied after t steps is Wt- Consider EvKjTvKtWI foi" P 
with d non-zeroes. Since the state p is physical, trp^ < 1 so Ylp^Joip) ^ 1- Now, 
in each step, if any site is chosen where p is non-zero, this term decays by a factor 
(1 — A). This occurs with probability 1 — ^'^ n^(n-i) ^'^ — '^/'^^i the probability of 
choosing a pair where at least one site is non-zero. Therefore 

niwM < ((1 - A)d/n + (1 - d/n)) |7m-ib)l 

where the expectation is over the circuit applied at step t. If we iterate this t times 
we find 

Ew^|7w(p)l < exp(-Atd/n)|7o(p)| 

where the expectation here is over all random circuits for the t steps. We now sum 
over all p: 

n 

Y,^whw{p)\ <^expi-Atd/n) ^ |7o(p)| 

Py^O d=l d{p)=d 

where is the number of non-zeroes in p. For the 1-norm bound, we can simply 
bound |7o(p)| < 1 to give Ea{p)=d l7o(p)| < so 

J2^wbw{p)\ < (l + 3exp(-Ai/n)r- 1 

where we have used the binomial theorem. Now let t = -J In This gives 
Y,^whwip)\ < (l + e/nr-l = 0(e). 
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For the 2-norm bound, 



5^ (Ely 1 7H/ (p) I ) ' < 5^ exp ( - 2 At(i/n)7o2 (p) 

n 

d=l d(p)=d 

n 

<^exp(-2Atd/n) 

d=\ 

^ exp(-2At/n) 
~1 - exp(-2At/n) 

where we have used X]p7o(p) < 1- We find after ^Inl/e steps that 

, 1 — e 

3.5.2 Second Moments Convergence 

Firstly, the ® Op^ terms for pi ^ p2 decay in a similar way to the non-identity 
terms in the 1-design analysis. In fact, the proof of Lemma 13.5.11 carries over almost 
identically to this case to give 

Lemma 3.5.2. ^/ier 0(n log 1/e) steps 
and after 0{n{n + log 1/e)) steps 

^ Eiy|7iy(pi,P2)| < e- 

Proof. Instead of the number of zeroes governing the decay rate, we need to count 
the number of places where pi and p2 differ. This gives 

E|7m(Pi,P2)| < ((1 - A)d/n + (1 - d/n)) |7m-i(Pi>P2)| 
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where now d is the number of differing sites. There are 
in d places so we find 

^whw{pi,P2)\ < 4"[(1 + 3exp(-At/n)r - 1]. 

Set t = ■^(nln4 + hi 1/e) to make this 0(e). The 2-norm bound fohows in the same 
way as for Lemma 13.5.11 □ 

We now need to prove the 7(p,p) terms converge quickly. We have seen above 
that the sum of the terms "y{p,p) is conserved and, for the purposes of proving Lemma 
13.3.21 we assume the sum is 1 and 'j{p,p) > for all p. 

To illustrate the evolution, consider the simplest case when the gates are chosen 
from C/(4). We have evaluated G in Section [3.4.21 for k = 2 for this case. Translated 
into coefficients this yields the following update rule, where we have written it for the 
case when qubits 1 and 2 are chosen: 

7i+l(n, r2, rs, . . . , Tn, Si, S2, S3, ... , Sn) 

r 

(n,?^2) / (S1,S2) 

7t(0,0,r3, . . . ,r„,0,0,S3, . . . , Sn) (n,?'2) = (si,S2) = (0,0) 

E r[y^ It{r'i,r2,r3, . . . , r„, r'^, r^, S3, . . . (ri,r2) = (si,S2) / (0,0). 

(3.5.7) 

The key idea of Oliveira et al. |QDP07| was to map the evolution of the "y{p,p) coef- 
ficients to a Markov chain. We can apply this here to get, on state space {0, 1, 2, 3}", 
the evolution: 

1. Choose a pair of sites uniformly at random. 

2. If the state is 00 it remains 00. 

3. Otherwise, choose the state uniformly at random from {0, 1, 2, 3}^\{00}. 
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This is the correct evolution since, if the initial state is distributed according to 
ItiQ^g), the final state is distributed according to 7t+i(p,p). 

The evolution for other gate sets will be similar, but the states will not be chosen 
uniformly randomly in the third step. However, the state 00 will remain 00 and the 
stationary distribution on the other 15 states is the same. We will find the convergence 
times for general gate sets and then consider the U{4:) gate set since we can perform 
a tight analysis for this case. 

3.5.3 Markov Chain Analysis 

Before finding the convergence rate for our problem, we will briefiy introduce the 
basics of Markov chain mixing time analysis. All of these standard results can be 
found in |MT06j and references therein. 

A process is Markov if the evolution only depends on the current state rather 
than the full state history. Therefore the evolution of the state can be thought of as a 
matrix, the transition matrix, acting on a vector which represents the current distri- 
bution. We will only be interested in discrete time processes so the state after t steps 
is given by the t*^ power of the transition matrix acting on the initial distribution. 

We say a Markov chain is irreducible if it is possible to get from one state to any 
other state in some number of steps. Further, a chain is aperiodic if it does not return 
to a state at regular intervals. If a chain is both irreducible and aperiodic then it is 
said to be ergodic. A well known result of Markov chain theory is that all ergodic 
chains converge to a unique stationary distribution. In matrix language this says that 
the transition matrix P has eigenvalue 1 with no multiplicity and all other eigenvalues 
have absolute value strictly less than 1. We will also need the notion of reversibility. 
A Markov chain is reversible if the time reversed chain has the same transition matrix, 
with respect to some distribution. This condition is also known as detailed balance: 
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It can be shown that a reversible ergodic Markov chain is only reversible with respect 
to the stationary distribution. So above 7r(x) is the stationary distribution of P. An 
immediate consequence of this is that for a chain with uniform stationary distribution, 
it is reversible if and only if it is symmetric (i.e. P{x,y) = P(y,x)). Note also that 
reversible chains have real eigenvalues, since they are similar to the symmetric matrix 
^J^^^P{x,y) (using the similarity transform 5xy \Jtt{x)). 

With these definitions and concepts, we can now ask how quickly the Markov 
chain converges to the stationary distribution. This is normally defined in terms of 
the 1-norm mixing time. We use (half the) 1-norm distance to measure distances 
between distributions: 

\\s-t\\ = 2 11^ ~ = 2 ~ (3.5.9) 

i 

We assume all distributions are normalised so then < — t|| < 1. We can now 
define the mixing time: 

Definition 3.5.3. Let vr he the stationary distribution of P. Then if P is ergodic the 
mixing time r is 

Tie) = maxmin{t > : ||P*s - ttII < e}. (3.5.10) 

St ' ' ' ' 

We will also use the (weaker) 2-norm mixing time (note this is not the same as T2 
in |MT06) ): 

Definition 3.5.4. Let vr he the stationary distribution of P. Then if P is ergodic the 
2-norm mixing time T2 is 

T2(e) = maxmin{t > : ||P*s - vrlL < e}. (3.5.11) 

St ' ' ' 

Unless otherwise stated, when we say mixing time we are referring to the 1-norm 
mixing time. 

There are many techniques for bounding the mixing time, including finding the 
second largest eigenvalue of P. This gives a good measure of the mixing time because 
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components parallel to the second largest eigenvector decay the slowest. We have (for 
reversible ergodic chains) 

Theorem 3.5.5 (see |MT06j . Corollary 1.15). 

1,1 

r e < -r In 

^ ^ - A vr^e 

where tt^, = min7r(2;) and A = min(l — A2,l + ^min) where A2 is the second largest 
eigenvalue and Xmin is the smallest. A is known as the gap. 

If the chain is irreversible, it may not even have real eigenvalues. However, we 
can bound the mixing time in terms of the eigenvalues of the reversible matrix PP* 
where P*{x,y) = ^^P{y,x). In this case we have ( |MT06j . Corollary 1.14) 

r(e)<-^ln— (3.5.12) 

where now App* is the gap of the chain PP* . Note that for a reversible chain P = P* 
and App* ~ 2A so the bounds are approximately the same. 

This can also be converted into a 2-norm mixing time bound: 

T2{e)<-^lnl/€. (3.5.13) 
App. 

To bound the gap, we will use the comparison theorem in Theorem 13.5.61 below. In 
this Theorem, we are thinking of the Markov chain as a directed graph where the 
vertices are the states and there are edges for allowed transitions (i.e. transitions with 
non-zero probability). For irreducible chains, it is possible to make a path from any 
vertex to any other; we call the path length the number of transitions in such a path 
(which will in general depend on the choice of path). 

Theorem 3.5.6 (see [MT06| . Theorem 2.14). Let P and P be two Markov chains 
on the same state space with the same stationary distribution vr. Then, for every 
X ^ y & ft with P{x, y) > define a directed path 'jxy from x to y along edges in P 



59 



and let its length be \'Jxy\- Let V he the set of all such paths. Then 

A > K/A 

for the gaps A and A where 

A = A(T)= max , . , — — T:ix)P{x,y)\^xv\- 

^ ' a^b,Pia,b)^OTT{a)P{a,b) ^ ^ ' V ^^/^Kxyl 

Xyiy:(a,b)e^xy 

For example, when comparing 1-dimensional random walks there is no choice in 
the paths; they must pass through every point between x and y. Further, the walk can 
only progress one step at a time so (without loss of generality, for reversible chains) 
let 6 = a + 1 to give 

A = max / — TTtIZ Y1 7r(x)P(x, - x) 
a Tr{a)P{a,a + 1) ^ ^ 

x<ay>a+l 

= max— -4- (3.5.14) 

a P(a,a + 1) ^ ' 

A generalisation of the comparison theorem involves constructing flows, which are 
weighted sets of paths between states. This can give a tighter bound since bottlenecks 
are averaged over. This gives a modified comparison theorem: 

Theorem 3.5.7 ( |DS93j . Theorem 2.3). Let P and P he two Markov chains on the 
same state space with the same stationary distribution vr. Then, for every x ^ y ^Vl 
with P{x, y) > 0, construct a set of directed paths Vxy from x to y along edges in P. 
We define the flow function f which maps each path ^xy £ Vxy to a real number in 
the interval [0, 1] such that 

^ f{-1xy) = P{x,y). 

'Jxy &7-^xy 
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Again, let the length of each path be \^xy\- Then 



A > A/ A 



for the gaps A and A where 



A = A{f) 



1 



E 



7r{x)f{7xy)hxy 



(3.5.15) 



a 



;^fe,P(a,6)^0 iT{a)P{a, b) 



Note that we recover the comparison theorem when there is just one path between 
each X and y. 

Yet another generahsation is to allow general length functions instead of simply 
counting the edges. This only appears in the literature as a comparison to the chain 
P{x,y) = TT{y) although it can easily be generalised to allow comparison with any 
chain. 

Theorem 3.5.8 ( |Kah96j . Proposit ion 1). Let P be a Markov chain on the state space 
Q with stationary distribution vr. Then, for every x ^ y £ Q define a directed path 
Ixy from X to y along edges in P and let its length be 



for any positive length function l{a,b), defined on the edges of the path. Let T be the 
set of all such paths. Then 





(3.5.16) 



(a,fe)e7a;a 



A > 1/A 



where 



A = A{r) 



1 





a- 




x^y:{a,b)e-y, 



'xy 
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Decomposit ion 

For some Markov chains, it is easier to consider different parts of the chain separately 
to prove convergence results. This allows, for example, different convergence tech- 
niques to be used on different parts of the chain. The separate parts are combined 
using the decomposition theorem: 

Theorem 3.5.9 ( [MRO OJ . Theorem 4.2). Let P{x,y) be the transition matrix for a 
reversible Markov chain with state space Q and stationary distribution tt{x). Then let 
J7j be disjoint subsets of such that UjJlj = Vt. Let 



Pi{x,y) = < 



P(x,y) x,y£ni,xj^y 
^-Ey'en^,y'^.Pix,y') x = yen, (3.5.17) 
otherwise. 



Further let Wi = Ylxen^ ''^{x) and 

P{iJ) = — V n{x)P{x,y). (3.5.18) 

111: 



Then 



Wi 



A>-AminAi (3.5.19) 

2 i 

where A is the gap of P, A for P and Aj for P^. 
log-Sobolev Constant 

We will need tighter, but more complicated, mixing time results to prove the tight 
result for the f/(4) case. We use the log-Sobolev constant: 

Definition 3.5.10. The log-Sobolev constant p of a chain with transition matrix P 
and stationary distribution it is 

E.^,(/(x)-/(y))2p(x,y)vr(y) 



p = mm 
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The mixing time result is: 



Lemma 3.5.11 (see |DS96j . Theorem 3.7'). The mixing time of a finite, reversible, 
irreducible Markov chain is 

r(e) = ofiloglog — + ^log-') (3.5.20) 

where p is the Sobolev constant, vr* is the smallest value of the stationary distribution, 
A is the gap and d is the size of the state space. 

Further, the comparison theorem (Theorem 13. 5. 6p works just the same to give 

P > P/A. 

We win need one more result, due to Diaconis and Saloff-Coste: 

Lemma 3.5.12 ( |DS96j . Lemma 3.2). Let Pi, i = 1, . . . ,d, be Markov chains with 
gaps Aj and Sobolev constants pi. Now construct the product chain P. This chain 
has state space equal to the product of the spaces for the chains Pi and at each step 
one of the chains is chosen at random and run for one step. Then P has spectral gap 
given by: 

A 1 ■ A 

A = — mm Ai 

d i 

and Sobolev constant: 

1 

p = - mm Pi. 

d i 

3.5.4 Convergence Proof 

We now prove the Markov chain convergence results to show that the 'y{p,p) terms 
converge quickly. We have already shown that the 7(^1,^2) terms with pi ^ p2 
converge quickly and that there is no mixing between these terms and the 'y{p,p) 
terms. Therefore, in this section, we remove such terms from G. 
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We want to prove the Markov chain with transition matrix (Eqn. I3.5.2P 

n{n — 1) ^-^ 

converges quickly. Firstly, we know from Section 13.4.31 that P has two eigenvectors 
with eigenvalue 1. The first is the identity state (do ® ctq) and the second is the 
uniform sum of all non-identity terms (grrrj Sp^o "^P ® '^p)' From now on, we remove 
the identity state. This makes the chain irreducible. Since we know it converges, it 
must be aperiodic also so the chain is ergodic and all other eigenvalues are strictly 
between 1 and —1. 

We show here that the gap of this chain, up to constants, does not depend on the 
choice of 2-copy gapped gate set. In the second half of the chapter we find a tight 
bound on the gap for the C/(4) case which consequently gives a tight bound on the 
gap for all universal sets. 

Since the stationary distribution is uniform, the chain is reversible if and only if 
P is a symmetric matrix. A sufficient condition for P to be symmetric is for G*^*-'^ 
to be symmetric. We saw in Theorem 13.4.31 that for the U (4) gate set case IS 
symmetric. In fact, the proof works identically to show that G'^*-'-' is symmetric for 
any gate set, provided the set is invariant under Hermitian conjugation. However, 
2-copy gapped gate sets do not necessarily have this property so the Markov chain is 
not necessarily reversible. We will find equal bounds (up to constants) for the gaps 
of both P (if G is symmetric) and PP* (if G is not symmetric) below: 

Theorem 3.5.13. Let fi be any 2-copy gapped distribution of gates. If fi is invariant 
under Hermitian conjugation then let Ap be the eigenvalue gap of the resulting Markov 
chain matrix P. Then 

Ap = 0(A^(4)) (3.5.21) 

where A[/(4) is the eigenvalue gap of the U{4) chain. If /i is not invariant under 
Hermitian conjugation then let App* be the eigenvalue gap of the resulting Markov 
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chain matrix PP* . Then 

App, =n{Au(4))- (3.5.22) 

Proof. We will use the comparison method with flows (Theorem I3.5.7p . Firstly con- 
sider the case where /i is closed under Hermitian conjugation i.e. G is symmetric. 

We will compare P to the f/(4) chain, which we call Pu{i)- Recall that this chain 
chooses a pair at random and does nothing if the pair is 00 and chooses a random 
state from {0, 1, 2, 3}2\{00} otherwise. 

To apply Theorem 13.5.71 we need to construct the flows between transitions in 
P(/(4). We will choose paths such that only one pair is modified throughout. For 
example (with n = 4), the transition 1000 — t- 2000 is allowed in P[/(4). To construct a 
path in P, we need to find allowed transitions between these two paths in P. G may 
not include the transition 10 — t- 20 directly, however, G is irreducible on this subspace 
of just two pairs. This means that a path exists and can be of maximum length 14 
if it has to cycle through all intermediate states (in fact, since G is symmetric the 
maximum path length is 8; all that is important here is that it is constant). For 
example, the transitions 10 —t- 11 —t- 20 might be allowed. Then we could choose the 
full path to be 1000 — )• 1100 — )• 2000. In this case we have chosen the path to involve 
transitions pairing sites 1 and 2. However, we could equally well have chosen any 
pairing; we could pair the first site with any of the others. We can choose 3 paths in 
this way. For this example, the flow we want to choose will be all 3 of these paths 
equally weighted. We now use this idea to construct flows between all transitions in 
-Pc/(4) to prove the result. 

Let X ^ y £ Q and let d{x, y) be the Hamming distance between the states 
{d{x,y) gives the number of places at which x and y differ). There are two cases 
where Pu(4){x,y) / 0: 

1. d{x,y) = 2. Here we must choose a unique pairing, specifled by the two sites 
that differ. Make all transitions in P using this pair giving just one path. 

2. d{x,y) = 1. For this case, choose all possible pairings of the changing site 
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that give allowed transitions in P[/(4)- For each pairing, construct a path in P 
modifying only this pair. If the differing site is initially non-zero then there are 
n — 1 such pairings; if the differing site is initially zero then there are n — z{x) 
pairings where z{x) is the number of zeroes in the state x. 

All the above paths are of constant length since we have to (at most) cycle through 
all states of a pair. We must now choose the weighting /{'Jxy) for each path such that 

5^/(7.j/) = P[/(4)(a^,y) (3.5.23) 

"Pxy 

where Vxy is the set of all paths from x to y constructed above. We choose the 
weighting of each path to be uniform. We just need to calculate the number of paths 
in Vxy to find /: 

1. d{x,y) = 2. There is just one path so fi'jxy) = -fV(4)(^)2/) = 0(l/n^). 

2. d{x,y) = 1. If the differing site is initially non-zero then Pu(4)ix,y) = 0(l/ra) 
and there are n — 1 paths so /{"fxy) = ^^^^l^i'^^ = 9(l/n^). If the differing site 
is initially zero then Pu{4){x,y) = @ (j-^^^ and there are n — z{x) paths so 
/(7..) = = e(l/n^). 

So for all paths, / = 9(l/?7.^). We now just need to know how many times each edge 
(a, b) in P is used to calculate A: 

A= max A(a,b) (3.5.24) 

where 

We have cancelled the factors of 7r(x) because the stationary distribution is uniform. 
We have also ignored the lengths of the paths since they are all constant. 

To evaluate A{a,b), we need to know how many paths pass through each edge 

(a, 6). We again consider the two possibilities separately: 
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1. d{a,b) = 2. Suppose a and b differ at sites i and j. Firstly, we need to count 
flow many transitions from x to y in Pu{4) could use this edge, and then how 
many paths for each transition actually use the edge. 

To find which x and y could use the edge, note that x and y must differ at sites 
i, j or both. Furthermore, the values at the sites other than i and j must be 
the same as for a (and therefore b). There is a constant number of x,y pairs 
that satisfy this condition. Now, for each x, y pair satisfying this, paths that 
use this edge must use the pairing i,j for all transitions. Since in the paths we 
have chosen above there is a unique path from x to y for each pairing, there is 
at most one path for each x, y pair that uses edge a, b. 

For d{a,b) = 2, P{a,b) = 0(l/n^) so A{a,b) is a constant for this case. 

2. d{a,b) = 1. Let there be r pairings that give allowed transitions in P between 
a and b. As above, each pairing gives a constant number of paths. So the 
numerator is 0(r/n^). Further, P{a, b) = Q{r/n?). So again A{a, b) is constant. 

Combining, A is a constant so the result is proven for the case G is symmetric. 

We now turn to the irreversible case. We now need to bound the gap of PP* = 
PP^ . This chain selects two (possibly overlapping) pairs at random and applies G 
to one of them and to the other. We can use the above exactly by choosing G to 
perform the transitions above and Cf^ to just loop the states back to themselves. By 
aperiodicity (the greatest common divisor of loop lengths is 1), we can always find 
constant length paths that do this. □ 

Now we need to know the gap of the U (4) chain. We can, by a simple application 
of the comparison theorem, show it is il(l/n^). However, in the second half of this 
chapter we show it is 0(l/n). This gives us (using Theorem 13. 5. 5p : 

Corollary 3.5.14. The Markov chain P has mixing time 0(n(n + log 1/e)) and 2- 
norm mixing time 0(n log 1/e). 

We conjecture that the mixing time (as well as Lemma l3.5.2p can be tightened to 
0(nlog^), which is asymptotically the same as for the ^7(4) case: 
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Conjecture 3.5.15. The second moments for the case of general 2-copy gapped dis- 
tributions have 1-norm mixing time 0(nlog^). 

It seems likely that an extension of our techniques in Section [3.61 could be used to 
prove this. 

Combining the convergence results we have proved our general result Lemma [3.3.2l 

Proof of Lemma \3.3.S[ Combining Corollarv l3.5.14l (for the j{p, p) terms) and Lemma 
13.5.21 (for the 7(^1,^2)1 Pi 7^ P2 terms) proves the result. □ 

We have now shown that the first and second moments of random circuits converge 
quickly. For the remainder of the chapter we prove the tight bound for the gap and 
mixing time of the C/(4) case and show how mixing time bounds relate to the closeness 
of the 2-design to an exact design. Only for the U{4) case is the matrix G a projector 
so in this sense the ^7(4) random circuit is the most fundamental. While we expect 
the above mixing time bound is not tight, we can prove a tight mixing time result 
for the C/(4) case. However, using our definition of an approximate /c-design, the gap 
rather than the mixing time governs the degree of approximation. 



We have already found tight bounds for the first moments in Lemma 13.5.11 just set 
A = L 

3.6.1 Second Moments Convergence 

We need to prove a result analogous to Lemma 13.5.21 for the terms ap-^ ® where 
Pi P2- We already have a tight bound for the 2- norm decay, by setting A = 1 into 
Lemma 13.5.21 We tighten the 1-norm bound: 

Lemma 3.6.1. After 0{nlogj) steps 



3.6 Tight Analysis for the [/(4) Case 




(3.6.1) 



Pl¥=P2 
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Proof. We will split the random circuits up into classes depending on how many qubits 
have been hit. Let H be the random variable giving the number of different qubits 
that have been hit. We can work out the distribution of H and bound the sum of 
l7vy(Pi)P2)| for each outcome. 
Firstly we have, after t steps, 

-(-^'")<:)(t^)'<:)<'-/")' 

Now, for each qubit hit, each coefficient which has pi and p2 differing in this place is 
set to zero. So after h have been hit, there are only (at most) 16*^""'^^ terms in the sum 
in Eqn. I3.6.T1 As before, the state is a physical state, tr < 1 so J2pip2 1'^iPiiP^) ^ 1 
^° Spip2 \^iP'^^P'i)\ — if there are at most N non-zero terms in the sum. Therefore 
we have, after t steps, 

n-1 

^ ¥.w\iw{pi,P2)\ <Y,nH = Mie^"-'^)/^ 

n-1 

i-h) 



h=l 

n-1 , X 

h=\ ^ ^ 

n-1 / ^ 

h=\ ^ ^ 

n-1 / X 



h ^ n — h 



n-1 

< 

h=l 



Now, let t = nln -: 



Pi7^P2 h=l 



/ \ / A \ h 

s—^ I n\ / 4e 



n 



n I \n 
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where the last Hne follows from the binomial theorem. □ 

This, combined with the mixing time result we prove below, completes the proof 
that the second moments of the random circuit converge in time 0(n log j). 

3.6.2 Markov Chain of Coefficients 

The Markov chain acting on the coefficients is reducible because the state {0}" is 
isolated. However, if we remove it then the chain becomes irreducible. The presence 
of self loops implies aperiodicity therefore the chain is ergodic. We have already seen 
that the chain converges to the Haar uniform distribution (in Section [3. therefore 
the stationary state is the uniform state tt{x) = 1/(4" — 1). Further, since the chain is 
symmetric and has uniform stationary distribution, the chain satisfies detailed balance 
(Eqn. I3.5.8|) so is reversible. We now turn to obtaining bounds on the mixing time of 
this chain. 

We want to show that the full chain converges to stationarity in time 0(nlog ^). 
To prove this, we will construct another chain called the zero chain. This is the chain 
that counts the number of zeroes in the state. Since it is the zeroes that slow down 
the mixing, this chain will accurately describe the mixing time of the full chain. 

Lemma 3.6.2. The zero chain has transition matrix P on state space (we count 
non-zero positions) = {1, 2, . . . , n}. 



P{x,y) 



1 _ 2x{3n-2x-l) _ 
^ 5n{n-l) y — X 



2x{x-l) 
5ri(n— 1) 

&x{n—x) 
5n(n— 1) 



y = X — 1 
y = x + l 
otherwise 



(3.6.2) 



for I < x,y < n. 



Proof. Suppose there are n — x zeroes (so there are x non-zeroes). Then the only way 
the number of zeroes can decrease (i.e. for x to increase) is if a non-zero item is paired 
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with a zero item and one of the 9 (out of 15) new states is chosen with no zeroes. The 
probabihty of choosing such a pair is '^^^^I^'l so the overah probabihty is ^ '^n(nli) ■ 

The number of zeroes can increase only if a pair of non-zero items is chosen and one 
of the 6 states is chosen with one zero. The probabihty of this occurring is ^ • 

The probabihty of the number of zeroes remaining unchanged is simply calculated 
by requiring the probabilities to sum to 1. □ 

We see that the zero chain is a one-dimensional random walk on the line. It is a 
lazy random walk because the probability of moving at each step is < 1. However, as 
the number of zeroes decreases, the probability of moving increases monotonically: 

, 2x(3n — 22; — 1) , , , 

1 - P(x, x) = — ^ — J > 2x 5n. 3.6.3) 

5n(n — 1) 

Lemma 3.6.3. The stationary distribution of the zero chain is 

ox(n\ 

-o(x) = (3.6.4) 

Proof. This can be proven by multiplying the transition matrix in Lemma 13.6.21 by 
the state Eqn. I3.6.4[ Alternatively, it can be proven by counting the number of states 
with n — x zeroes. There are (") ways of choosing which sites to make non-zero and 
each non-zero site can be one of three possibilities: 1, 2 or 3. The total number of 
states is 4" — 1, which gives the result. □ 

Below we will prove the following theorem: 

Theorem 3.6.4. The zero chain mixes in time 0(nlog^). 

We prove this using direct arguments about the convergence of the random walk. 
However, we also include a less complex method that only bounds the gap: 

Theorem 3.6.5. The zero chain has gap Q(l/n). 

This only implies the mixing time is 0{n{n + logl/e)) which is weaker than 
Theorem 13.6.41 although still sufficient to prove our main result Theorem 13.3. H 
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using a modification of Corollary 13.6.71 to show that the full chain mixing time is 
0(n(n + logl/e)). 

Knowing the gap allows us to easily work out the 2-norm mixing time: 

Theorem 3.6.6. The zero chain has 2-norm mixing time 0(nlogl/e). 

Proof. Use the bound on the gap in Theorem 13.6.51 and Eqn. 13.5.131 □ 

Before proving Theorem 13.6.41 we will show how the mixing time of the full chain 
follows from this. 

Corollary 3.6.7. The full chain mixes in time 0(nlog j). 

Proof. Once the zero chain has approximately mixed, the distribution of zeroes is 
almost correct. We need to prove that the distribution of non-zeroes is correct after 
O(nlog^) steps too. 

Once each site of the full chain has been hit, meaning it is chosen and paired 
with another site so not both equal zero, the chain has mixed. This is because, after 
each site has been hit, the probability distribution over the states is uniform. When 
the zero chain has approximately mixed, a constant fraction of sites are zero so the 
probability of hitting a site at each step is 0(l/n). By the coupon collector argument, 
each site will have been hit with probability at least 1 — e in time time 0(n log ^). 
Once the zero chain has mixed to e', we can run the full chain this extra number of 
steps to ensure each site has been hit with high probability. Since the mixing of the 
zero chain only increases with time, the distance to stationarity of the full chain is 
now 1 — € — e' . We make this formal below. 

After to = 0(n log steps, the number of zeroes is e'-close to the stationary 
distribution vro by Theorem 13.6.41 and only gets closer with more steps since the dis- 
tance to stationarity decreases monotonically. The stationary distribution Eqn. 13.6.41 
is approximately a Gaussian peaked at 3n/4 with 0{n) variance. This means that, 
with high probability, the number of non-zeroes is close to 3n/4. We will in fact only 
need that there is at least a constant fraction of non-zeroes; with probability at least 
1 — e' — exp(— 0(n)) there will be at least n/2. 
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To prove the mixing time, we run the chain for time to so the zero chain mixes to 
e'. Then run for ti additional steps. Let Hi^t be the event that site i is hit at step t. 
Let Hi = U*i{*^ii?i,t and H = nf^iFj. We want to show F{H) is close to 1, or, in 
other words, that all sites are hit with high probability. Further let be the random 
variable giving the number of non-zeroes at step t. 

If at step t — 1 site i is non-zero then the event Hi^t occurs if the qubit is chosen, 
which occurs with probability 2/n. If, however, it was zero then it must be paired 
with a non-zero thing for Hi^t to hold. Conditioned on any history with X^^i > n/2, 
this probability is > 1/n. In particular, we can condition on not having previously 
hit i and the bound does not change. Combining we have 



Hit 



[Xt.,>n/2]f]{ n Hit'] I <1-V 
t'=to+l 



n. 



Then, after ti extra steps. 



m 



to+ti-1 \ 
n [^t>n/2] <(l-l/n)*^ 
t=to J 



which, using the union bound, gives 



^ ,to+ti-l \ 

n [Xt>n/2]\<n{l-l/nf\ 
\ ' t=to ) 



Now, since the zero chain has mixed to e'. 



'to+ti-l \ / n-l 

fl {Xt > n/2] ] <ti\ J2 ^o(^) + I < ii (exp(-0(n)) + e') 

t=to J \x=n/2 



SO 



P(if") < n(l - l/n)*i + h (exp(-0(n)) + e') . 

Now, choose ii = nln^ so that P(iJ'^) < S where S = e + h {eiq){-0{n)) + e'). 
Choose e = 1/n and e' = 1/n^ so that S is l/poly(n). Now, using the bound on 



73 



F{H^), we can write the state v after ti = 0(n log n) steps as 

V = {1 — 6)tt + 5it' 

where vr is the stationary distribution and vr' is any other distribution. Using this, 

||u — 7r|| < 6. 

We now apply Lemma 13.9.131 to show that after 0(n log ^) steps the distance to 
stationarity of the full chain is e. □ 

3.6.3 Proof of Theorem 3.6.4 

We will now proceed to prove Theorem l3.6.4[ We present an outline of the proof here; 
the details are in Section I3.9.11 

Firstly, note that by the coupon collector argument, the lower bound on the time 
is r2(nlogn). We need to prove an upper bound equal to this. Intuition says that 
the mixing time should take time 0(n log n) because the walk has to move a distance 
B(n) and the waiting time at each step is proportional to n, n/2, n/3, . . . which sums 
to 0(n log n), provided each site is not hit too often. We will show that this intuition 
is correct using Chernoff bound and log-Sobolev (see later) arguments. 

We will first work out concentration results of the position after some number of 
accelerated steps. The zero chain has some probability of staying still at each step. 
The accelerated chain is the zero chain conditioned on moving at each step. We define 
the accelerated chain by its transition matrix: 
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Definition 3.6.8. The transition matrix for the accelerated chain is 







y 



X 



x-1 



y 



X — 1 



Pa{x,y) 



3n-2x-l 



(3.6.5) 



= < 



3{n—x) 



y 



x + l 



3n-2x-l 







otherwise. 



We use the accelerated chain in the proof to firstly prove the accelerated chain 
mixes quickly, then to bound the waiting time at each step to obtain a mixing time 
bound for the zero chain. 

To prove the mixing time bound, we will split the walk up into three phases. We 
will split the state space into three (slightly overlapping) parts and the phase can 
begin at any point within that space. So each phase has a state space 0,i C [1,?^-], 
an entry space Ei C ilj and an exit condition Tj. We say that a phase completes 
successfully if the exit condition is satisfied in time 0(n log n) for an initial state 
within the entry space. When the exit condition is satisfied, the walk moves onto the 
next phase. 

The phases are: 

1. Qi = [l,n^] for some constant S with < S < 1/2. Ei = Qi (i.e. it can 
start anywhere) and Ti is satisfied when the walk reaches . For this part, 
the probability of moving backwards (gaining zeroes) is 0{n^~^) so the walk 
progresses forwards at each step with high probability. This is proven in Lemma 
13.9.61 We show that the waiting time is O(nlogn) in Lemma 13.9.71 

2. $^2 = [n^/2,6n] for some constant 6 with < 9 < 3/4. E2 = [n^,9n] and 
T2 is satisfied when the walk reaches 9n. Here the walk can move both ways 
with constant probability but there is a il(l) forward bias. Here we use a 
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monotonicity argument: the probability of moving forward at each step is 



p{x) 



3(n — x) 



3n — 2x — 1 
3(n — x) 



3n — 2x 



^ 3(1 -g) 
- 3-29 ' 



If we model this random walk as a walk with constant bias equal to 



3(i-e) 

3-26 



we 



will find an upper bound on the mixing time since mixing time increases mono- 
tonically with decreasing bias. Further, the waiting time at x = a stochastically 
dominates the waiting time at x = 6 for b > a. The true bias decreases with po- 
sition so the walk with constant bias spends more time at the early steps. Thus 
the position of this simplified walk is stochastically dominated by the position of 
the real walk while the waiting time stochastically dominates the waiting time 
of the real walk. 

3. = [|n,n] and = [9n,n]. T3 is satisfied when this restricted part of the 
chain has mixed to distance e. Here the bias decreases to zero as the walk 
approaches 3n/4 but the moving probability is a constant. We show that this 
walk mixes quickly by bounding the log-Sobolev constant of the chain. 

Showing these three phases complete successfully will give a mixing time bound for 
the whole chain. 

We now prove in Section [319] that the phases complete successfully with probability 
at least 1 — 1/ poly(n): 

Lemma 3.6.9. 



P(Phase 1 completes successfully) > 1 — n' 



25-1 



- 2n 



S 
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Lemma 3.6.10. 



P(Phase 2 completes successfully) > 



1 - exp ( -|/x6'n) - ( - \ ^ L - {q/pT^^^ 



3"^ J \enj l-exp(-/i/2) 



where /x = ^^^29^ ~ 
Lemma 3.6.11. 



P(Phase 3 completes successfully) > 1 



We can now finally combine to prove our result: 



3(2 - 9) 



en/2 



Proof of Theorem 3. 6.4\ The stationary distribution has exponentially small weight 



in the tail with lots of zeroes. We show that, provided the number of zeroes is within 
phase 3, the walk mixes in time 0{n log ^). We also show that if the number of zeroes 
is initially within phase 1 or 2, after O(nlogn) steps the walk is in phase 3 with high 
probability. We can work out the distance to the stationary distribution as follows. 

Let be the probability of failure. This is the sum of the error probabilities 
in Lemmas I3.6.9|, 13.6.101 and 13.6.111 The key point is that pf = 1/ poly(n). Then 
after 0(n log j) steps (the sum of the number of steps in the 3 phases), the state is 
equal to (1 — Pf)vz + pfv' where is the state in the phase 3 space and v' is any 
other distribution, which occurs if any one of the phases fails. Since the distance to 
stationarity in phase 3 is e, ||f3 — vraH < e, where tt^ is the stationary distribution on 
the state space of phase 3. In Lemma [3.9.11l we show that '7r3(x) = ^{x) / [l — w) where 
w = Ylt~i ^ T^ix)- Since tt{x) is exponentially small in this range, w is exponentially 
small in n. Now use the triangle inequality to find 

ll^^s — '''"ll < \ \v3 — vTsll + llvTs — 7r||. (3.6.6) 

Since the chain in phase 3 has mixed to e, the first term is < e. We can evaluate 
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IF3 - vr||: 

1 " 

11^3 -vr|| = -^WiTsix) - tt{x)\\ 

/en/2-1 n \ 

= 2 E ^(^)+ E (vr(x)/(l-^)-^(x)) 

y x=l x=en/2 j 

= ^ (if + 1 - (1 - w)) = w. 

So now, 

11(1 -Pf)v3+pfv' - 7r|| = 11(1 -Pf){v3 - Tr)+pf{v' - 7r)|| 

< (1 -Pf)\\v3 - 1t\\ +Pf\\v' - 1t\\ 

< (1 -Pf){(^ + w) +Pf 

< s 

where 5 = e + w + pf. We are free to choose e: choose it to be 1/n so that 5 is 
1/ poly(n). So now the running time to get a distance 6 is t = O(nlogn). We then 
apply Lemma 13.9.131 to obtain the result. 

This concludes the proof of Theorem 13.6.41 so Corollary 13.6.71 is proved. □ 

We have now proven Lemma 13.3.21 and consequently Corollary 13.3.31 We are now 
ready to show how Theorem 13.3. II follows, but first give the alternative proof that the 
zero chain gap is 17 (1/n). The remainder of the proof of Theorem 13.3.11 is in Section 

Ml 



3.6.4 Proof of Theorem 3.6.5 

Here we prove that the gap of the zero chain is ^}{l/n). While this can be deduced 
from Theorem 13.6.41 and provides weaker mixing time bounds, this bound on the gap 
is sufficient to prove our main result so we present it as a simpler alternative proof. 
We use the method of decomposition (Theorem 13. 5. 9p . whereby the Markov chain 
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is split up into disjoint state spaces. This works well here because, for the first part of 
the walk with many zeroes, the walker remains stationary most of the time whereas 
when there is a constant fraction of zeroes, the walker moves on most steps. Using the 
decomposition method allows us to use different techniques in these different regimes. 

We therefore divide the walk up into two parts. Pi and P2, which are shown 
in Figure 13. 2[ The chain P is the chain that links the two parts, according to the 
decomposition theorem. Theorem 13.5.91 



Figure 3.2: The decomposition of the zero chain into Pi and P2. The graph plotted 
is the zero chain stationary distribution tt(x). 



• Pi: Let Pi have state space fii = {1, . . . ,m}. This chain has transition matrix 



Pi 




1 



I 



3n/4 



n 







X > m or y > m 




X = y = m 



(3.6.7) 




otherwise 



and stationary distribution 




'm 
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where 



m 



hm = Y,^{x). (3.6.9) 



x=l 



P2: Let P2 be on state space = {w- + 1, • • • ,n}. This chain has transition 
matrix 



X < m or y < m 

l-P(m+l,m + 2) X = y = m + 1 (3.6.10) 
P(x, y) otherwise 



and stationary distribution 



Mx) = (3.6.11) 



where 

n 

Cm= ^ vr(x) = l- bm- (3.6.12) 

x=m+l 

We will take m = On where < < 0.49 (we could in principle just have < 3/4 
but this restriction makes the calculations simpler; see Lemma 13.6.121 for the origin 
of the upper bound on 6). Note that P2 is the same as phase 3 used in the direct 
mixing time proof (up to relabelling m + 1 to m). Therefore we already have, from 
the proof of Lemma l3.9.1H that the gap A2 of P2 is Vl{l/n). To find the gap of the 
whole zero chain we need to find the gaps Ai and A. An ingredient to proving this 
is an exponential bound on the tail of the stationary distribution: 



Lemma 3.6.12 

and for 6 < Oq ^ 0.49, 7r(6'n) = e'^^"). 

Proof. Use (^) < {^f and 4" - 1 > ^ to prove the bound. When i (f )^ < 1, 
Tr{9n) is exponentially small. 6q is the solution to j (^)^ = 1- D 

From this we can bound the gap of P: 
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Lemma 3.6.13. The gap of P is n{l/n). 

Proof. We first need to work out the transition matrix for P. From tlie definition of 
P in tlie decomposition tlieorem, 

2) = -^7r(m)P(m, m + 1) 
P{2, 1) = — 7r(m + l)P{m + 1, m). 

Cm 

We can find the two diagonal elements using the fact that the transition matrix is 
stochastic. Because the zero chain is reversible, P is also reversible and by direct 
calculation of the eigenvalues has gap 

A = 1 - |1 - (P(l, 2) + P(2, 1))|. (3.6.14) 

However, we can remove the modulus signs since, for n large enough, P(l,2) + 
P{2, 1) < 1. This is because, using reversibility and 7r(m) < bm 

Tr(m)P(m,m + 1) P(m,m + 1) 
P(l,2) + P(2,1) = , ^ < (3.6.15) 

Using Lemma 13.6.121 we find that > 1 - e'^^") for m = On and 9 < 6q. Using 
P(m, m + 1) < 3/5, we find that for n large enough, P{1, 2) + P{2, 1) < 1. 

Now we need to show that -P(l,2) + P{2, 1) = 0(1). Again using Lemma [3.6. 121 
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we find P{2, 1) = e~^^"-\ We just need a bound on P{1, 2). First we bound T^ijri^ jhy^ 

iT{ra) 7r(m) 
1 



Em ■k(x) 
x=l n{m) 

_ 1 

E"^ ox—m (.x) 
x=l'^ ("\ 

1 

> 



> 



1 

2^x=o 

1 

x=0 '-' 

2 
3 



using (") < for x < m < n/2. For m = we have P{m,m + 1) = so 
overall P{1, 2) = r2(l), proving the bound on the gap. □ 

Lemma 3.6.14. The gap of Pi is 0,(1 /n). 

Proof. We use the comparison method with length functions as stated in Theorem 
13.5.81 The length function we choose is, for x < y, l{x,y) = for some constant r 
satisfying < r < 1. 
Let 

J z m J/—! 

Then, according to Theorem 13.5.81 Ai '>1/A where A = max^^^. We need to find 
an upper bound for A: 
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>^ x=l x=z+l / 

' z m 

(1_6^_(1_6^))^^-^(^)_(1_(1_6^)) ^ ^-^(^) 

>^ x=l x=z+l 

^ m z m \ 

(l-5,)^rM^)-(l-6^)^rM^)- ^ rM^) J 

>v X=l X=l X=Z+1 / 

^ m z m \ 

^va;=l a;=l x=z+l / 





— r)'n{z)P{z, z 
1 






— r)'K{z)P{z, z 
1 


+ l)hm 




— r)7r(z)P(z, z 
1 


+ l)fem 




— r)7r(z)P(2;, 2; 


+ l)&m 




— r)7r(z)P(z, z 


+ 1) 



where the inequahty comes from 1 — bz < 1. Now let = ^^^^^^ X]i=i ?'^7r(a;) then, 

plugging in the value of P{z, z + 1), we find 

. ^ 5n(n - 1) , , , 
Gz(n — z)(\ — r) 

Now, max^ Qz(n—z)(i—r) — ^('^) showing max^ h^ir) — ^(-'^) sufficient the prove 
the bound we require. We evaluate hz recursively: 
Firstly, 

z-l 



hzir) = IH 3_ r^7r(x). 

^ ' x=l 



83 



Then evaluate the sum: 

2-1 2-1 



x=l x=l 



2-1 



3r ^-^ n — X 
1 

-y 



X 



-r^TT(x) 



Combining, 



3r ^-^ n — X + 1 

x=2 

1 ^ 

6r ^-^ n — a; + 1 

x=\ 

1 Z ^ 

< T, rry "r'^T^ix). 

3rn- z + 1 ^ ^ ^ 

x=l 



hz{r) < 1 + -7\hzir) 

Sr{n — z + 1) 



or 



hz{r) < ^ 



1 



3r(n-2+l) 

Since 1 < z < On, hz{r) is constant in this range, proving the result. □ 

We can now combine the results to prove the bound on the zero chain gap: 

Proof of Theorem \3. 6.5[ Using Lemmas 13.6.141 13.9.111 and 13.6.131 together with The- 
orem [332] proves the result. □ 

3.7 Main Result 

We will now show how the mixing time results imply that we have an approximate 
2-design. 

Proof of Theorem \3.3.1\ We will go via the 2-norm since this gives a tight bound 
when working with the Pauli operators. We write p in the Pauli basis as usual (as 
Eqn. I3.2.3P and note that p is not necessarily a physical state so the coefficients may 
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not be real. 



\\Gw - QhWI = sup-^ WiGw O id22n)(p) - {Gh O id22n)(p)||^ 



< 2*" sup 
p 

1 

— sup 77-72 

1 



\\{Gw(S) id22n ) (p) - {Gh (E) id22n ) (p) 1 1 ^ 



P1.P2>P3'P4 



Now, write (for pip2 / 00) ^vK(w'7pi <^ (Jp^) = w I] 91,92 gt{qi,q2;Pi,P2)(^qi cTgj. 

9192 7^00 

We get 



2"(2" + 1) 



sup TT 

p IHIi 



X] 7o(pi,P2,P3,P4) [9t{qi,q2;pi,P2) 



P1,P2>P3,P4,91,92 
P1P2#00,9192 7^00 



2 ^"Pjni^ ^ |70(P1,P2,P3,P4)| I 5t(9l,92;Pl,P2) - ^77^— I 

IIPIIl P1,P2.P3>P4.91,92 ^ \ ' )/ 



<2^"e2g^p 



< 2^"e2 sup 



24ng2 



■ P3 

P1P2 7^00>9192 7^00 

X;P1,P2,P3.P4 |70(Pl,P2,P3,P4)|' 

^ PlP2 7^00 

2 
1 

2 
2 



where the first equahty comes from the orthogonahty of the Pauli operators under 
the Hilbert-Schmidt inner product. This proves the result for the diamond norm, 
Definition 12.2.101 For the distance measure defined in Definition 12.2.11] (TWIRL), 
the argument in [DCEL06] can be used together with the 1-norm bound to prove the 
result. □ 
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3.8 Conclusions 



We have proved tight convergence results for the first two moments of a random 
circuit. We have used this to show that random circuits are efficient approximate 
1- and 2-unitary designs. Our framework readily generalises to A;-designs for any k 
and the next step in this research is to prove that random circuits give approximate 
/c-designs for all k. 

We have shown that, provided the random circuit uses gates from a universal 
gate set that is also universal on f7(4), the circuit is still an efficient 2-design. We 
also see that the random circuit with gates chosen uniformly from {7(4) is the most 
natural model. We note that the gates from {7(4) can be replaced by gates from any 
approximate 2-design on two qubits without any change to the asymptotic convergence 
properties. 

Finally, random circuits are interesting physical models in their own right. The 
original purpose of [QDP07j was to answer the physical question of how quickly en- 
tanglement grows in a system with random two party interactions. Lemma 13.3.2( 1) 
shows that 0(n(n + logl/e)) steps suffice (in contrast to 0{n^{n + logl/e)) which 
they prove) to give almost maximal entanglement in such a system. 

3.9 Proofs 

3.9.1 Zero chain mixing time proofs 
Asymmetric Simple Random Walk 

We will use some facts about asymmetric simple random walks i.e. a random walk on 
a ID line with probability p of moving right at each step and probability q = 1 —p of 
moving left. 

The position of the walk after k steps is tightly concentrated around k{p — q): 

Lemma 3.9.1. Let be the random variable giving the position of a random walk 
after k steps starting at the origin with probability p of moving right and probability 
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q = 1 — p of moving left. Let /j, = p — q. Then for any ij > 0, 

> fJ,k + r)) < exp 

and 

F{Xk <lJik-ri)< exp 

Proof. The standard Chernoff bound for 0/1 variables % gives, with Yi equal to 1 
with probability p and for Yj. = Yl\=i 

¥{Yk >kp + rj)< exp 
IP(^fe <kp-rj) < exp 

For our case, set Yi = 2Xi — 1 to give the desired result. □ 

This result is for a walk with constant bias. We will need a result for a walk with 
varying (but bounded from below) bias: 

Lemma 3.9.2. Let be the random variable giving the position of a random walk 

after k steps starting at the origin with probability pi > p of moving right and proba- 
bility qi < p of moving left at step i. Let jj, = p — {1 — p). Then for any rj > 0, 

F{Xk >l^k + r))< exp 

and 

<nk-r])< exp 

Proof. Let Yi be a random variable equal to 1 with probability p and with probability 
1 — p. Then let Zi be a random variable equal to 1 with probability pi and with 
probability 1 — pi. Let Y^ = Yi and = Yli=i ^i- Then following the standard 
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Chernoff bound derivation (for A > 0), 




gA(fcp+r)) 




We can then, as above, set Zi = 2Xi — 1. The calculation is similar for the bound on 



From Lemma 13.9.11 we can prove a result about how often each site is visited. If 
the walk runs for t steps the walk is at position tfj, with high probability so we might 
expect from symmetry that each site will have been visited about l//i times. Below 
is a weaker concentration result of this form but is strong enough for our purposes. 
It says that the amount of time spent < x is about 

Lemma 3.9.3. For 7 > 2 and integer x > 0, 



nXk<fik-rj). 



□ 




where I is the indicator function. 



Proof. Let = < x). From Lemma l3.9.1|, 




for k < x/ fj, and 




iov k > X / ^. 
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Then the quantity to evaluate is 



kfe=i 



We use a standard trick to spUt this into two mutually exclusive possibilities and then 
bound the probabilities separately. Write 



^ oo \ / IX /iJ, 

Kk=l J \j=l 

Y.Yk>lx/A[][ U K=0]1 |. (3.9.1) 
\k=l J \ j=l 



We can bound the first term: 



(-yx/fj, 
k=l 

< P {Y^^/^ = 1) 

/ /xa;(7-l)2 

<exp' 



The second term similarly: 



< 



/ oo \ 

U [^fc = i] 



k=^+i 

OO 



exp 



{kiJ, — xY 



2k 



< exp — 



lix(j - 2) 



□ 
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The last fact we need about asymmetric simple random walks is a bound on the 
probability of going backwards. If p > q then we expect the walk to go right in the 
majority of steps. The probability of going left a distance a is exponentially small in 
a. This is a well known result, often stated as part of the gambler's ruin problem: 

Lemma 3.9.4 (See e.g. |GW86j ). Consider an asymmetric simple random walk that 
starts at a > and has an absorbing barrier at the origin. The probability that the 
walk eventually absorbs at the origin is 1 if p < q and {q/pY otherwise. 

This result is for infinitely many steps. If we only consider finitely many steps, 
the probability of absorption must be at most this. 

Waiting Time 

From above we saw that the probability of moving is at least 2x/'on when at position 
X. The length of time spent waiting at each step is therefore stochastically dominated 
by a geometric distribution with parameter 2x/5n. The following concentration result 
will be used to bound the waiting time (in our case /3 = 2/5): 

Lemma 3.9.5. Let the waiting time at each site be W{x) ~ Geo{/3x/n), the total 
waiting time W = Yli=i ^(^) "-^d t' = Then 

F{W > Ct') < 2t(^-^)/2_ 



Proof. By Markov's inequality for A > 0, 



The W{x) are independent so 



F{W > Ct') < 



x=l 
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Summing the geometric series we find 

]ggAVK(a;) ^ n_ 



e-^ - 1 + H± 



provided < — \^ for all 1 < x < t. Therefore is of the form — ^ where 

n n 

< a < 1. With this, 

]EgAty(x) ^ ^_ 
X — a 

and 



r(t + l-a)' 

We are free to choose a within its range to optimise the bound. However, for simplicity, 
we will choose a = 1/2. From Lemma 13.9.121 

Ee^^ < 2Vt. 



The result follows, using the inequality 1 — x < e ^ . □ 
Phase 1 

Here we prove that phase 1 completes successfully with high probability. The bias 
here is large so the walk moves right every time with high probability: 

Lemma 3.9.6. The probability that the accelerated chain moves right at each step, 
starting from x = 1 for t steps, is at least 

1 - t^/n. 
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Proof. The probability of moving right at each step is 



t 



3{n-x) _ (n-2)(n-3)...(n-t) 



n 



3n - 2x - 1 (n - 5/3)(n - 7/3) . . . (n - (2t + l)/3) 



x=l 



> (1 - 2/n)(l - 3/n) ... (1 - t/n) 



> (1 -t/n)* > 1 -tV"- 



□ 



Let t = n . Provided 6 < 1/2 this probabihty is close to one. Therefore, with high 
probability, the walk moves to in steps. Using Lemma 13.9.51 the waiting time 
can be bounded: 



Proof. This follows directly from Lemma [3. 9. 5^ since each site is hit exactly once. □ 

We now combine these two lemmas to prove that phase 1 completes successfully 
with high probability: 

Proof of Lemma \3. 6.9\ . In Lemma 13.9.61 we show that in accelerated steps, the 
walk moves right at each step with probability > 1 — n^^'^. Call this event H. Then 
¥{H) > 1 — n'^^~^. Lemma 13.9.71 shows that the waiting time W^^^ is bounded with 
high probability (choosing C = 3): 



Lemma 3.9.7. Let W^^^ be the waiting time during phase 1. Let H he the event that 



the walk moves right at each step. Then 




(3.9.2) 



where t' = 



55n In n 
2 



p(Ty(i) < I5n6lnn/2\H) > 1 - 2n-^. 
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Then we can bound the probabihty of phase 1 completing successfully: 



P(Phase 1 completes successfully) > P(i? n PT^^^ < 15n51nn/2) 

= P(i?)P(VF(^) < Ibn5\nn/2\H) 
> {1 - n^^-^){l - 2n~^) 



> 1-n- 



- 2n 



-s 



□ 



Phase 2 



Phase 2 starts at n /2 and finishes when the walk has reached 0n for some constant 
< ^ < 3/4. We show that, with high probability, this also takes time 0(n log n). 
The probability of moving right during this phase is at least p = ^_29 • ^^^^ 
define some constants that we will derive bounds in terms of. Let 7 be a constant 
> 2. Let fj, = p — (1 — and fl = fi/j. Finally let s = jit for some t (which will be the 
number of accelerated steps). Then, with high probability, the walk will have passed 
s after t steps: 



Lemma 3.9.8. Let Xt be the position of the walk at accelerated step t, where Xq = n . 
Then 



nXt <s)< exp(-/x2t(i _ 1/7)2/2). 



Proof. Let X'^ = Xt - n' 



Then from Lemma 13.9.21 




Now let r] = fit — s and use 



^{Xt <s)= F{Xt <s-n^) 



< m't < s) 



to complete the proof. 



□ 
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We now prove a bound on the waiting time: 

Lemma 3.9.9. Let W^"^^ be the waiting time in phase 2. Then, assuming the walk 
does not go hack beyond jl, 



^l^^rO\ 15nlns\ ,,,^'!/o„ 2exp^ ^ 



l-exp(^) 



(3.9.3) 



Proof. Let ~ Geo I -f-^ ) where is the position of the walk at accelerated step 



5n 



k {Xq = n^). We want to bound (w.h.p.) the waiting time = X^^^^ of t 

steps of the accelerated walk. 
Define the event H to be 



H 



n 



.k=l 



(3.9.4) 



If H occurs, no sites have been hit too often and the walk has not gone back further 
than It is important that we also use the restriction that X^ > because 

the waiting time grows the longer the walk moves back. However, it is very unlikely 
that the walk will go backwards (even to n^/2). 

We now define some more notation to bound the waiting time. Let X = 
{Xi,X2,. . . ,Xt) be a tuple of positions and let iV-r(X) be the number of times 
that X appears in X and let N(X) = (iVi(X), A'^2(X), . . . , iV„(X)). Then we have 

E,iv.(x) = t. 

As we said above, the waiting time at a; = o stochastically dominates the waiting 
time at X = 6 for 6 > a. In other words, 



Wk > Wk' if Xk < Xy 



(3.9.5) 



where X >Y means that X stochastically dominates Y. Now write the waiting time 
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for all steps 

t 

k=l 

= E E ^^(^) (3.9.6) 

X h = l 

where Wh{x) ~ Geo (|^). 

If event H occurs, we can put some bounds on N^. We find that, for all x > n'^/2, 

X 

^ Ny{X) < x/jl (3.9.7) 

y=n^/2 

and A'^j^(X) = for x < Now let be such that A''„i/2(^m) = and 

iVa;(Xm) = for X > nV2. Then 



Ny{Xm)=x/fl. {3.91 

y=n^/2 



Now we introduce the relation ^: 

Definition 3.9.10. Let x and y 6e n-tuples. Then x ^ y if 



k 



Y,x^<Yyi (3.9.9) 



1=1 1=1 



/or all 1 < k < n with equality for k = n. 

Note that this is like majorisation, except the elements of the tuples are not sorted. 
Using this, we find that N(X) ^ N(X^) (Using J^y^yi^) = J2yNy{^') = t for ah 
X,X'.) 

If we combine Equations [3X5l and we find that W^^XX) \> W^^\^) if 

N(X) ^ N(X'). Roughly speaking, this is simply saying that the waiting time is 
larger if the earlier sites are hit more often. But since for all X that satisfy H, 
X ^ Xm, we have VF(2)(X) < W^^\^^) provided H occurs. We will simplify further 
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by noting that Xm ^ Xq where A^^(Xo) = for 1 < x < fit = s and zero elsewhere. 
Therefore 



^(2)(X) > 



2/i 



We can bound this by applying Lemma [3.9.51 Let Wh = 'Ylx=i ^h{x). From Lemma 

> Ct') < 2s^ (3.9.10) 



where t' = ^il^. However, we want a bound on P {Y^]I^i Wh > Ct'/jij . The same 
reasoning as in Lemma 13.9.51 bounds this as 

F XI - ^^'/f^ - (^^^) • (3.9.11) 



Therefore 



^(^)(Xo)>^^l<2VA,^. (3.9.12) 
2/x J 



To complete the proof, we just need to find P(i7^). We can bound it using the 
union bound and Lemma 13.9.31 
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u 



k=l 

oo 



< ^[J2^(.Xk<x)>x/fi 

x=n'5/2 \fc=l / 
x=n^/2 



< 2exp 

x=n^/2 



-fix{j - 2) 



2 exp 



-Mn^(7-2)' 



1 _ exp (^^) 



Now, for any events A and B 

F{A) = F{A nB)+ F{A n B") 

= F{A\B)F{B)+F{AnB'') 
< F{A\B) + F{B'') 

and set C = 2 and 7 = 3 to obtain the result. □ 

We now combine these two lemmas to prove that phase 2 completes successfully 
with high probability: 

Proof of Lemma Vj.d.llA Phase 2 can fail if: 

• The walk does not reach On. The probability of this is bounded by Lemma [3.9.8t 

F{Xt < On) < exp (^-^fiOnj . 

This follows from setting t = and 7 = 8. 
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The waiting time is too long. This probabihty is bounded by Lemma 13.9.91 



I5n\n{dn)\ ^/4\4, 2exp[-f^ 



< 7^ + 



\9n J 1 — exp(— ^/2) 



The walk gets back to This is bounded by Lemma 13.9.41 



Walk gets to < {q/pT''"^ 



So, using the union bound we can bound the overall probability of failure: 



P(Phase 2 fails) < exp 



2 ^ ^ ^4xi 2exp(^ 



3^^"j + i^j +T3^xp(-^/2) 



Phase 3 

This phase starts at On. We show that this mixes quickly using log-Sobolev arguments. 

Lemma 3.9.11. The zero chain on the restricted state space x G [m, n] where m = On 
for > has mixing time O (n log ^) . 

Proof. We restrict the Markov chain to only run from m by adjusting the holding 
probability at m, P{m,m). Construct the chain P' with transition matrix 



P'ix,y) = < 



X < moiy < m 

1 — P{m, m + 1) X = y = m 
P{x, y) otherwise 



(3.9.13) 



where P is the transition matrix of the full zero chain. This chain then has stationary 
distribution 

7r(x)/(l — w) m < X < n 

(3.9.14) 

otherwise 
where w = ^^^i tt{x). To see this, first note that the distribution is normalised. We 



7r'(x) = < 
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want to show that 

n 

^P'(x,y)7r'(x) = 7r'(y). (3.9.15) 

x=m 

When y = m we are required to prove that P' {■m,m)iT' (m) + P'(m + 1,171)11' (m + 
1) = 7r'(?7i). This follows from the reversibility of the unrestricted zero chain, using 
P'{m, m) = 1 — P{m, m + 1). For y > m, Eqn. 13.9.151 is satisfied simply because ir{x) 
is the stationary distribution of P and related by a constant factor to 7r'{x). 

We can now prove this final mixing time result, making use of Lemma [3. 5. 121 Let 
Qi be the chain that uniformly mixes site i. This converges in one step and has a 
log-Sobolev constant independent of n; call it pi. Let Q be the chain that chooses 
a site at random and then uniformly mixes that site. This is the product chain of 
the Qi so, by Lemma 13.5.121 has gap 1/n and Sobolev constant pg = pi/n. We can 
construct the zero chain for this and find its Sobolev constant. 

The Sobolev constant is defined (Definition 13.5. lOp in terms of a minimisation over 
functions on the state space. For the chain Q we can write 

PQ = inf /(0). 

If we restrict the infimum to be over functions <j) with (j){x) = (j){y) for x and y 
containing the same number of zeroes then we obtain the Sobolev constant for the 
zero-Q chain, pq^, which is the chain which counts the number of zeroes in the full 
chain Q. Since taking the infimum over less functions cannot give a smaller value, 

PQo >PQ> Pi In. 

We can now compare this chain to the zero-P chain. The stationary distributions are 
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the same. The transition matrix for the zero-Q chain is 



Qoix,y) = < 



n+2x 
An 

X 

An 

3(71—2:) 
An 



y = x 
y = x -1 
y = X + 1 
otherwise 



Then construct Q'q by restricting the space to only run from m in exactly the same 
way as P' is constructed from P. Q'q has the same stationary distribution as P' . Now 
we can perform the comparison. From Eqn. 13.5.141 

Q'o{a,a + l) 

A = max ■ 



a>m P'{a, a + 1) 

5(n-l) 5 

= max < — . 

a>m 8a 89 

Therefore pp' > Exactly the same argument applies to show the gap is J7(l/n) 

so the mixing time is (from Eqn. I3.5.20p 0(n log j). □ 

Now we can prove that phase 3 completes successfully with high probability: 

Proof of Lemma VJ. 6. 1 il In Lemma 13.9.111 we show that after O (nlog-^) steps the 
chain mixes to distance e. We just need to show that the walk goes back to On/2 with 
small probability. This follows from Lemma 13.9.41 □ 

3.9.2 Moment Generating Function Calculations 

The following lemma is needed in the moment generating function calculations. 
Lemma 3.9.12. For Integer s > 0, 

r(s + i)r(i/2) 



r(s + i/2) 



< 2^5 (3.9.16) 
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Proof. From expanding the F functions, Eqn. 13.9.161 becomes 

s!2'* 2 X 4 X 6 X ... X 2(s - 1) X 2s 



(2s - 1)!! 1 X 3 X 5 X ... X (2s - 3) X (2s - 1) 



J-J- 2x - 1 

x=l 



We then proceed by induction. na;=i 2x-i ~ ^ ^^'^ inductive hypothesis 

TT ^ 2(s + l) ^ 

ii^^-2(s + l)-l'^- 



a;=l 



It is easy to show that 2(s+i)-i — y result follows. □ 



2(s+l)- 

3.9.3 Mixing Times 

We find bounds for the mixing time above that are valid with high probability. Below 
we turn these into full mixing time bounds. 

Lemma 3.9.13. // after 0(n log n) steps the state v of a random walk satisfies 

\\v — T^W < <J 

where vr is the stationary distribution and 6 is l/poly{n) then the number of steps 
required to be at most a distance e from stationarity is 

O ^nlog — ^ . 

Proof. Let s be the slowest mixing initial state. Then, after t = 0(n log n) steps we 
have at worst the state 

(1 -5)7r + 5s 



and if we repeat kt times S becomes J'^. So to get a distance e, k 



log<5 
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Now we evaluate the mixing time: 



kt = 0(n log n) 



log (5 



= 0(n log n) 



logl/e 



logl/<5 
= 0{n max(log n, log 1/e)) 
= O ^nlog — ^ . 



□ 
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Chapter 4 

Quantum Tensor Product 
Expanders and an Efficient 
Unitary Design Construction 

4.1 Introduction 

In this chapter, we give an efficient construction of a unitary fc-design on n qubits for 
any k up to 0(n/log(n)). We will do this by first finding an efficient construction 
of a quantum k-copy tensor product expander (k-TPE), which can then be iterated 
to produce a fc-design. We will therefore need to understand some of the theory of 
expanders before presenting our construction. 

Classical expander graphs have the property that a marker executing a random 
walk on the graph will have a distribution close to the stationary distribution after 
a small number of steps. We consider a generalisation of this, known as a A;-tensor 
product expander (TPE) and due to [HH09j . to graphs that randomise k different 
markers carrying out correlated random walks on the same graph. This is a stronger 
requirement than for a normal {k = 1) expander because the correlations between 
walkers (unless they start at the same position) must be broken. We then generalise 
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quantum expanders in the same way, so that the unitaries act on k copies of the 
system. We give an efficient construction of a quantum fc-TPE which uses an efficient 
classical fc-TPE as its main ingredient. We then give as a key application the first 
efficient construction of a unitary /c-design for any k. 

While randomised constructions yield fc-designs (by a modification of Theorem 5 
of [AB W09] ) and A;-TPEs (when the dimension is polynomially larger than k |HH09j ) 
with near-optimal parameters, these approaches are not efficient. Previous efficient 
constructions of /c-designs were known only for k = 1,2, and no efficient constant- 
degree, constant-gap quantum A;-TPEs were previously known, except for the k = 1 
case corresponding to quantum expanders |BT07^ IAS041 IHarOSl iGEOSj . 

In Section 14.1.11 we will define quantum expanders and other key terms. Then in 
Section 14.1.21 we will describe our main result which will be proved in Section 14. 2[ In 
this chapter, we will use N to denote the dimension rather than d to be consistent 
with the rest of the quantum expander literature. 

This chapter has been published previously as |HL09aj and is joint work with 
Aram Harrow. 

4.1.1 Quantum Expanders 

We will only consider D-regular expander graphs here. We can think of a random 
walk on such a graph as selecting one of D permutations of the vertices randomly at 
each step. We construct the permutations as follows. Label the vertices from 1 to 
A'^. Then label each edge from 1 to D so that each edge label appears exactly once 
on the incoming and outgoing edges of each vertex. This gives a set of D permu- 
tations. Choosing one of these permutations at random (for some fixed probability 
distribution) then defines a random walk on the graph. 
We now define a classical A;-TPE: 

Definition 4.1.1 ( |HH09p . Let u be a probability distribution on with support on 
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< D permutations. Then u is a classical {N,D,X,k)-TPE if 



Bin)^'^ -E^^Sr, Bin) 



vr 



< A. 



(4.1.1) 



with A < 1. Here E^, r^i, means the expectation over vr drawn according to v and IEt^^^^ 
means the expectation over vr drawn uniformly from Sn. 

Setting k = 1 recovers the usual spectral definition of an expander. Note that a 
(A^, -D, A, A;)-TPE is also a (A^, D, A, A;')-TPE for any k' < k. The largest meaningful 
value of k is k = N, corresponding to the case when describes a Cayley graph 
expander on ^at. 

The degree of the map is D = | suppz^| and the gap is 1 — A. Ideally, the degree 
should be small and gap large. To be useful, these should normally be independent 
of and possibly k. We say that a TPE construction is efficient if it can be im- 
plemented in poly(logA^) steps. There are known constructions of efficient classical 
TPEs. The construction of Hoory and Brodsky |BH08] provides an expander with 
D = poly(log A^) and A = 1 — 1/ poly (A;, log A^) with efficient running time. An effi- 
cient TPE construction is also known, due to Kassabov |Kas05| . which has constant 
degree and gap (independent of A^ and k). 

Similarly, we define a quantum A;-TPE: 



Definition 4.1.2 ( [HH09j ). Let v he a distribution onU{N), the group of N x N 
unitary matrices, with D = \ supp iy\ . Then u is a quantum (N, D, A, k)-TPE if 



E 



E 



^k,k 



< A 



(4.1.2) 



with A < 1. Here Efj^i/^j^-^ means the expectation over U drawn from the Haar mea- 
sure. 

Again, normally we want D and A to be constants and setting k = 1 recovers the 
usual definition of a quantum expander. Note that an equivalent statement of the 
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above definition is that, for all p, 



E, 



U~U{N) 



< X\\p\\ 



(4.1.3) 



A natural application of this is to make an efficient unitary /c-design. The definition 
we use here is the same as for a fc-TPE, except with closeness in the 1-norm rather 
than the oo-norm. This is given in Definition 12.2.121 (TRACE). 

We can make an e-approximate unitary /c-design from a quantum /c-TPE with 
0{klogN) overhead: 

Theorem 4.1.3. If U is a quantum {N,D,X,k)-TPE then iterating the map m = 
iog\/A times gives an e-approximate unitary k-design according to Definition 

\2.2.m (TRACE) with D"^ unitaries. 

Proof. Iterating the TPE m times gives 



Ur^U{N) 



This implies that 



We take m such that N'^^X^ = e to give the result. 



□ 



Corollary 4.1.4. A construction of an efficient quantum (N, D, \,k)-TPE yields an 
efficient approximate unitary k-design, provided A = 1 — 1/ poly(log A^). Further, if 
D and A are constants, the number of unitaries in the design is A^(*^('^)). 

Our approach to construct an efficient quantum fc-TPE will be to take an efficient 
classical 2fc-TPE and mix it with a quantum Fourier transform. The degree is thus 
only larger than the degree of the classical expander by one. Since the quantum Fourier 
transform on requires poly(log A^) time, it follows that if the classical expander 
is efficient then the quantum expander is as well. The main technical difficulty is to 
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show for suitable values of k that the gap of the quantum TPE is not too much worse 
than the gap of the classical TPE. 

A similar approach to ours was first used in jHH09j to construct a quantum ex- 
pander (i.e. a 1-TPE) by mixing a classical 2-TPE with a phase. However, regardless 
of the set of phases chosen, this approach will not yield quantum /c-TPEs from classical 
2A;-TPEs for any k>2. 

4.1.2 Main Result 

Let uj = e^'^'^l^ and define the A^-dimensional Fourier transform to be 

N N 



N N 

•^=^EE^"''H("i- (4.1.4) 



m=l n=l 

Define 6j: to be the distribution on U{N) consisting of a point mass on J^. Our main 
result in this chapter is that mixing 6jr with a classical 2/c-TPE yields a quantum 
/c-TPE for appropriately chosen k and A^. 

Theorem 4.1.5. Let uc be a classical {N,D, 1 — ec,2k)-TPE, and for < p < 1, 

define vq = pvc + (1 — p)5t- Suppose that 

eA := 1 - 2(2A;)^V^ > 0. (4.1.5) 
Then vq is a quantum (N, D + 1,1 — eg, k)-TPE where 

> ^ min(pec, 1 - p) > (4.1.6) 



The bound in Eqn. \4.1.6 is optimised when p = 1/(1 + ec), in which case we h 



ave 



eg > (4.1.7) 

This means that any constant-degree, constant-gap classical 2A;-TPE gives a quan- 
tum A;-TPE with constant degree and gap. If the the classical TPE is efficient then 
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the quantum TPE is as well. Using Corollary 14.1.41 we obtain approximate unitary 
fe-designs with polynomial-size circuits. 

Unfortunately the construction does not work for all dimensions; we require that 
N = Q{{2k)^''), so that ca is lower-bounded by a positive constant. However, in 
applications normally k is fixed. An interesting open problem is to find a construction 
that works for all dimensions, in particular a fc = oo expander. (Most work on /c = oo 
TPEs so far has focused on the N = 2 case |BG06j .) We suspect our construction 
may work for k as large as cN for a small constant c. On the other hand, if 2A; > 
then the gap in our construction drops to zero. 

4.2 Proof of Theorem 4.1.5 
4.2.1 Proof overview 

First, we introduce some notation. Define £"1^ = E^^^^^ [i3(7r)®^^] and <?^(^-) = 
lEc7~w(Af) [f^®^'^]- These are both projectors onto spaces which we label and 
V^(7V) respectively. Since V^(7v) C V^^, it follows that iSj^ — f^^^-j is a projector 
onto the space Vq := Vsj^ n V^j-^j^y We also define = E^^^^ [^(vr)®^^] and Sj;^ = 

The idea of our proof is to consider S'^^ a proxy for "^J^; if Ac is small enough 
then this is a reasonable approximation. Then we can restrict our attention to vectors 
in Vo, which we would like to show all shrink substantially under the action of our 
expander. This in turn can be reduced to showing that J^®^^^ maps any vector in Vq 
to a vector that has ^^(1) amplitude in V^^. This last step is the most technically 
involved step of the chapter, and involves careful examination of the different vectors 
making up V^^. 

Thus, our proof reduces to two key Lemmas. The first allows us to substitute E"^^ 
for £"1^ while keeping the gap constant. 

Lemma 4.2.1 ( [HH09j Lemma 1). Let 11 &e a projector and let X and Y he operators 
such that ||X||oo < 1, ||l"||oo < 1, HA = AH = li, ||(/ - n)X(/ - n)||oo < 1 - ec and 
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||nyn||oo < l — ca- Assume < ec^A < 1- Then for any < p < 1, \\pX + (1 
p)^||oo < 1- Specifically, 



\\pX + (1 - p)Y\\^ ^ 1 - ^ mm{pec, 1 - p). 



(4.2.1) 



We will restrict to Vjj-^j^^, or equivalently, subtract the projector £l}(j^) from each 



U{N) 



operator. Thus we have X = - Sj^f^^y n = <S|^'^ - £l}(,^-^ and Y = j:'^^'^ - gj^^ 



Sn '-'U{N) 



■'U{N)- 



According to Definition I4.1.1|, we have the bound 



||(/-n)x(/-n) 



\c2k c2k II ^ 1 



(4.2.2) 



It will remain only to bound \a '■= 1 — 
Lemma 4.2.2. For N > {2k f , 



c2k ck \ 'r®k,k I c2k ck 

^Sn ^U{N) ) V^Sn ^U{N) 



Xa 



c2k 



U(N) 



jr^k,k / g2k 



) < 2(2A:)^V^- (4-2.3) 

/ oo 



Combining Eqn. I4.2T21 Lemma 14.2.21 and Lemma [4.2.11 now completes the proof of 
Theorem 14.1.51 



4.2.2 Action of a Classical 2A;-TPE 

We start by analysing the action of a classical 2A;-TPE. (We consider 2A;-TPEs rather 
than general fc-TPEs since our quantum expander construction only uses these.) The 
fixed points are states which are unchanged when acted on by 2k copies of any per- 
mutation matrix. Since the same permutation is applied to all copies, any equal 
indices will remain equal and any unequal indices will remain unequal. This allows 
us to identify the fixed points of the classical expander: they are the sums over all 
states with the same equality and difference constraints. For example, for k = 1 (cor- 
responding to a 2-TPE), the fixed points are J2ni X^„j-^„2 I'^i'^^a) (all 
off-diagonal entries equal to 1). In general, there is a fixed point for each partition of 
the set {1, 2, . . . , 2k} into at most non-empty parts. If > 2k, which is the only 



109 



case we consider, the Bell number /32fc gives the number of such partitions (see 
e.g. [5ti86l ). 

We now write down some more notation to further analyse this. If 11 is a partition 
of {1, . . . ,2k}, then we write 11 h 2k. We will see that iSj^ projects onto a space 
spanned by vectors labelled by partitions. For a partition IT, say that G 11 if 

and only if elements i and j are in the same block. Now we can write down the fixed 
points of the classical expander. Let 

In = {{ni,. . . , n2k) : rii = nj iff e U}. (4.2.4) 

This is a set of tuples where indices in the same block of 11 are equal and indices in 
different blocks are not equal. The corresponding state is 

|/n) = Yl I") (4.2.5) 

Vl^nl neiu 

where n = (ni, . . . , n2k)- Note that the {/n}ni-2fc form a partition {1, . . . , N}'^^ and 
thus the {|/n)}nh2fc form an orthonormal basis for V^^^. This is because, when apply- 
ing the same permutation to all indices, indices that are the same remain the same 
and indices that differ remain different. This implies that 

C = E l^n)(/n|. (4.2.6) 

U\-2k 

To evaluate the normalisation, use |/n| = (-^)|n| where {N)n is the falling factorial 
A^(A^ — 1) . . . {N — n + 1) and |n| is the number of blocks in H. We will later find it 
useful to bound {N)n with 

1 - ^ j iV" < iN)n < N^. (4.2.7) 

We will also make use of the refinement partial order: 
Definition 4.2.3. The refinement partial order < on partitions 11, 11' G Par(2/c, A^) 
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is given by 

n < n' iff {i, j) eu^ {i, j) e W. (4.2.8) 

For example, {{1, 2}, {3}, {4}} < {{1, 2, 4}, {3}}. Note that U<U' implies that 

|n| > |n'|. 

Turning Inequality Constraints into Equality Constraints. 

In the analysis, it will be easier to consider just equality constraints rather than both 
inequality and equality constraints as in In- Therefore we make analogous definitions: 

En = {(ni, . . .,n2k) : rii = n-i(i,j) G H} (4.2.9) 

and 

l^n) = E I")- (4-2-10) 

V l^nl „gEj^ 

Then |£'n| = N^^V For En, indices in the same block are equal, as with in, but 
indices in different blocks need not be different. 

We will need relationships between in and Eji. First, observe that Eji can be 
written as the union of some in sets: 



Eii= [j In'- (4.2.11) 



n'>n 

To see this, note that for n G Ejj, we have rii = nj\/{i,j) G 11, but we may also have 
an arbitrary number of additional equalities between n^'s in different blocks. The 
(unique) partition 11' corresponding to these equalities has the property that 11 is a 
refinement of 11'; that is, 11' > 11. Thus for any n G Eji there exists a unique 11' > 11 
such that n G in'- Conversely, whenever 11' > 11, we also have in' ^ -^n' ^ Eu 
because each inclusion is achieved only be relaxing constraints. 
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Using Eqn. 14.2. IH we can obtain a useful identity involving sums over partitions: 



iVl^l = \En\ = l^n'l = J2 ^(in'D- (4-2.12) 
n'>n n'>n 

Additionally, since both sides in Eqn. 14.2.121 are degree |n| polynomials and are equal 
on > |n| + 1 points (we can choose any N in Eqn. 14.2.121 with N > 2k), it implies 
that xini = J2n '>n^{n') as an identity on formal polynomials in x. 

The analogue of Eqn. 14.2.111 for the states \Eji) and |/n) is similar but has to 
account for normalisation factors. Thus we have 

^/\E^\\Eu) = Yl V\h^\\In'). (4.2.13) 
n'>n 

We would also like to invert this relation, and write |In) as a sum over various 
\Eyi'). Doing so will require introducing some more notation. Define (^(11)11') to be 
1 if n < n' and if 11 ^ XT'. This can be thought of as a matrix that, with respect 
to the refinement ordering, has ones on the diagonal and is upper-triangular. Thus it 
is also invertible. Define /x(n, 11') to be the matrix inverse of meaning that for all 
Hi , 112 , we have 

J2 C(ni,n')/i(n',n2)= ^ Mni,n')C(n',n2) = ^n^.n,, 

n'\-2k n'\-2k 

where Sui,U2 = 1 if Hi = 112 and = otherwise. Thus, if we rewrite Eqn. 14.2.131 as 

v1^|^n)= Yl C(n,n')v1^l^n'), (4.2.14) 

U'\-2k 

then we can use /i to express |/n) in terms of the |£'n) as 

= Mn,n')v^:Ml^n')- (4.2.15) 

n'\-2k 

This approach is a generalisation of inclusion-exclusion known as Mobius inversion. 
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and the function /x is called the Mobius function (see Chapter 3 of |Sta86j for more 
background). For the case of the refinement partial order, the Mobius function is 
known: 

Lemma 4.2.4 f |Rot64j . Section 7). 

Mn,n') = (-l)l"Hn'l|Q(5^_l)! 

j=l 

where bi is the number of blocks ofH in the i^^ block ofU'. 

We can use this to evaluate sums involving the Mobius function for the refinement 
order. 

Lemma 4.2.5. 

|/i(n,n')|xin'l =x(|n|) (4.2.16) 

n'>n 

where x is arbitrary and x^""^ is the rising factorial x{x + 1) • • • (x + n — 1). 
Proof. Start with = (-l)|n|-|n'l^(n, n') to obtain 

|/.(n,n')|xin'i = (-i)ini ^ ^(n,n')(-x)in'i 

n'>n n'>n 

= (-i)ini 5] Mn,n') C(n',n")(-x)(|n',|) 

n'>n n">n' 

using Eqn. H?^.12l Then use Mobius inversion and (— x)^^) = {—l)'^x^^^ to prove the 
result. □ 

We will mostly be interested in the special case x = 1: 

Corollary 4.2.6. 

Y IMn,n')| = |n|! (4.2.17) 

n'>n 

Using |^(n, n')| > 1 and the fact that 11 > {{1}, . . . , {n}} for all 11 h n, we obtain 
a bound on the total number of partitions: 
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Corollary 4.2.7. The Bell numbers f3n satisfy f3n < nl. 
4.2.3 Fixed Points of a Quantum Expander 

We now turn to V^(Ar), the space fixed by the quantum expander. As in Chapter [21 
the only operators on (C^)'^'^' to commute with U^^ for ah U are linear combinations 
of subsystem permutations. The equivalent statement for 14/(Af) is that the only states 
invariant under all JJ®'^'^ are of the form 

-j= ^ l"'i,---,nfc,n^(i),...,n^(fc)), (4.2.18) 

for some permutation vr G S^- Since '^^^(Af) ~ '^\U®^'^] projects onto the set of states 
that is invariant under all JJ®^'^ ^ it follows that V^(Ar) is equal to the span of the 
states in Eqn. 14.2.181 

Now we relate these states to our previous notation. 

Definition 4.2.8. For vr G 5^, define the partition corresponding to vr hy 

P(7r) = {{1, k + 7r(l)}, {2, k + 7r(2)}, ...,{k,k + 7r(A:)}} . 

Then the state in Eqn. 14.2.181 is simply \Ep(^^-^), and so 

Vu(N) = span{|Sp(,)) : ^ G 5^}. (4.2.19) 

Note that the classical expander has many more fixed points than just the desired 
\Ep(^^^). The main task in constructing a quantum expander from a classical one is 
to modify the classical expander to decay the fixed points that should not be fixed by 
the quantum expander. 
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4.2.4 Fourier Transform in the Matrix Element Basis 

Since we make use of the Fourier transform, we will need to know how it acts on a 
matrix element. We find 

n 

where 

m.n = mini + . . . + rukUk - mk+irik+i - ... - m2kn2k (4.2.20) 

We will also find it convenient to estimate the matrix elements {Eiij\J'^^'''\Eii2) . 
The properties we require are proven in the following lemmas. 

Lemma 4.2.9. Choose any 111,112 l~ 2k. Let m G IIi and n G 112. Call the free 
indices of m rhi for 1 < i < \Ili\- Then let m.n = X]l=i' X^j=i where Aij is 

a \Ili\ X 2k matrix with entries in {0, 1, —1} which depends on Hi (but not Il2)- Then 

{EnAJ^^''^''\Eu2) = AT-fe+ '^^'a'"^' ^ I lj2Ajr^j = mod iVVi | (4.2.21) 

ne^na \ J / 

where I is the indicator function. 
Proof. Simply perform the m sum in 

{Eu,\T^'''''\Eu^) = iV"('^+'"''2'"'') J] (4.2.22) 

meEn^ nG-Boa |--| 

Lemma 4.2.10. (E'n^ l^'^'^'^^l^'na) is real and positive. 

Proof. Since all entries in the sum in Eqn. 14.2.211 are nonnegative and at least one 
(n = 0) is strictly positive, Lemma [4.2.91 implies the result. □ 

Lemma 4.2.11. IfU[ < Ui and U'^ < U2 then 

^\EuA-\Eu,\{EuA^''^''\En,) < ^|i?n'J • |i?n^Ki?n'J-^^'''|i?n^> (4.2.23) 
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Proof. We prove first the special case when 11'^ = Hi, but — ^2 is arbitrary. Recah 
that n'2 < 112 imphes that C E^i' . Now the LHS of Eqn. 14.2.231 equals 



N ^ expf^m.nj 



me-Bn^,neSn 



= Arini|-fc ^ I ^^ijnj = OmodiVVi 
= A^|ni|-fc ^ I(nGEn2)II (X]^»'i^i = OmodiVVi 
< ATinil-fc I I ^ylij-nj = mod iVVi 



2' 



as desired. To prove Eqn. 14.2.231 we repeat this argument, interchanging the roles of 
Hi and 112 and use the fact that (-Eni |-^'^'^''^|-E'n2) is symmetric in Hi and 112. D 



Lemma 4.2.12. 



Proof. Here, there are two cases to consider. The simpler case is when |ni| + |n2| < 2k. 
Here we simply apply the inequality 

Y exp (^m.n) < |i?nj \En.\ = N\^^\+\^-\ 

to Eqn. 141221 and conclude that {Eji^J'®^'^\Eii^) < jv '"''^'"'' -^. 
Next, we would like to prove that 

(^nj-^^^'^l-gna) < iV^" 2' '\ (4.2.25) 
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Here we use Lemma 14.2.111 with H'l = ILi and = {{!}> {2}, • • • , {2A;}}, the maxi- 
mahy refined partition. Note that l-^n^l = A^^*^ and T'^^'^\Eyi'J = |0). Thus 

establishing Eqn. I4.2.251 □ 

Lemma 4.2.13. IfUi = U2 = P{tt) then (E^m l-T^^'^'^'l-E^na) = 1- V, for any Ui, U2 
with jllil + |n2| = 2k, either condition isn't met (i.e. either Hi 7^ 112 or there does 
not exist tt £ Sk such that P{it) = Hi = 112 J then 

9k 

(EnJ-F^'-'^l^n,) < ^ (4.2.26) 

for N > k. 

Proof. In Lemma 14.2.141 'we introduce the Hi x 112 matrix A with the property that 

inil inal 

m.n = ^ ^ rhiAijfij (4.2.27) 
i=i j=i 

for all m G Hi and n G 112 where rhj and hj are the free indices of m and n. This is 
similar to the matrix A introduced in Lemma 14.2.91 except only the free indices of n 
are considered. 

For Hi = 112 = -P(^); Lemma [4.2.141 implies that ^ = 0, or equivalently m.n = 
for all m,n G P(7r). Using |ni| + |n2| = 2k, {En,\T^''^^\Eu^) = 1. 

Otherwise we have (ni,n2) {(P(7r), P(7r)) : vr G Sk} with |ni| + |n2| = 2k. For 
all these, Lemma 14.2.141 implies that A is nonzero (for N > k, no entries in A can 
be > or < — so j4 = mod N is equivalent to j4 = 0). Fix an i for which the 
i^^ row of A is nonzero. We wish to count the number of (ni, . . . ,ri|n2|) such that 

Aijhj = mod A^. Assume that each Aij divides A^ and is nonnegative; if not, 
we can replace Aij with GCD{\Aij\, N) by a suitable change of variable for hj. 

Now choose an arbitrary j such that Aij 7^ 0. For any values of hi, ... , nj-i, 
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. . . , n|n2|, there are \Aij\ < 2k choices of hj such that Ylj ^ij^j = niod N. 
Thus, there are < 2kN\^^\~^ choices of n such that Ylj = mod A^. Substitut- 

ing this into Eqn. 14.2.211 (which we can trivially modify to apply for A rather than 
just A)^ we find that 

thus establishing Eqn. 14.2.261 □ 

Lemma 4.2.14. Let A be the matrix such that m.n = X]!=i' Y^^i i^i-AijUj for all 
m G Hi and n G 112 where rhj and hj are the free indices of m and n. Then A = Q 
if and only i/ Hi, 112 > Pi^^) for some tt £ Sk- 

Proof. We first consider Hi = 112 = Pi'^) for the "if" direction. Note that for any 
m, n G Ep(^^-^, we have 

k k 

m.n = ^ nijUj - ^ m^f^j^n^f^j^ = 0. (4.2.28) 

This implies that ^ = 0. Now, choose any IIi > P{tt) and 112 ^ P{'^)- Then for any 
m G Hi and n G 112, m, n G P{tt)- This means Eqn. 14.2.281 holds for this case so 
^ = also. 

On the other hand, suppose that A = 0. We will argue that this implies the 
existence of a permutation n such that Hi, 112 > -P(^)) thus establishing the "only if" 
direction. 

Let Hi J (resp. 112 j) denote the j^^ block of Hi (resp. 112). Then 



Aij — ^ 



j'eni,i 
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where Aj/j/ is defined to be 



1 ifi' = / G {1,...,A:} 
-1 if i' =y G {/c + l,...,2A;} ■ 
if / f 

li A = then for each i,j we have 

|ni,i n U2,j n {1, . . . , A:}| = \Ui^i n U2j n{k + i,..., 2k}\ . (4.2.29) 

Denote the meet of Hi and 112, Hi A 112 to be the greatest lower bound of Hi and 1X2, 
or equivalently the unique partition with the fewest blocks that satisfies Hi A 112 < Hi 
and Hi A 112 ^ 112. The blocks of Hi A 112 are simply all of the nonempty sets 
ni,i n U2J, for i = 1, . . . , IIIil and j = 1, . . . , |n2|. Thus, Eqn. 14.2.291 implies that 
each block of Hi A 112 contains an equal number of indices from {1, . . . , A;} as it does 
from {A; + 1, . . . , 2k}. This implies the existence of a permutation ir £ Sk such that 
{i,k + Tr{i)} is contained in a single block of Hi A 112 for each i = 1, . . . ,k. Equivalently 
Hi A n2 > P(7r), implying that Hi > P(7r) and n2 > P(vr). □ 

4.2.5 Proof of Lemma 4.2.2 

Proof. We would like to show that, for any unit vector 1-0) G Vq, K^/^IJ^*^*^'*^ 1-0)1^ < 
2{2k)^^ / . Our strategy will be to calculate the matrix elements of J^^^^ in the 
|/n) and \E.,^) bases. While the |/n) states are orthonormal, we will see that the 
(i^ni |-^'^'^'^|-E'n2) matrix elements are easier to calculate. We then use Mobius func- 
tions to express |/n) in terms of |-En)- 

Consider the matrix S^'^T^^'^S^^. It has k\ unit eigenvalues, corresponding to 
the A;!-dimensional space Vi^(Af)- Call the k\ + l'^'^ largest eigenvalue Xa- We bound Xa 
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with 



kl + Xl< tr £ZT^'''£Z 



ni,n2i-2fc 



(4.2.30) 



We divide the terms in Eqn. 14.2.301 into four types. 



1. The leading-order contribution comes from the A;! terms of the form Hi = 112 
P{ir) for TT £ Sk- We bound them with the trivial upper bound 



\{IuA^^'''Vn,)\' <1 



(4.2.31a) 



(which turns out to be nearly tight). We will then show that the remaining 
terms are 

2. If |ni| + iHal < 2k then 



< 



|/nJ-|/n,|iV2/c 



E 

meni 
nen2 



e N 



N2k 



1 



< yY|ni|+|n2|-2fe < J_ 
-AT' 



where in the last line we have used the fact that |In| < \E]j\ = A^I^L 



(4.2.31b) 



3. If lllil + 1112 1 > 2k then we will show that 



< 



4- (2A;! 

'n' 



1^2 



(4.2.31c) 



4. If |ni| + |n2| = 2k but either Hi ^ II2 or there is no vr G 5^ satisfying P{tt) 
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Hi =112, then we will show that 



2 ^ {{2k)l + 2kf ^ 4- (2A:!)2 



N 



(4.2.31d) 



To establish these last two claims, we will find it useful to express |In) in terms of 
the various \Eyi) states. 

Lemmas 14.2.121 and 14.2.131 can now be used together with the Mobius function to 
bound |(/nil-7^®'''''|/n2)P- First, suppose |ni| + iHs] > 2k. Then 



< 



n'i>ni 
n'>n2 



E 

n'i>ni 





\E-n' 
1 lij 1 






l\Eni\ 


\Er\' , 
1 ii2 1 1 



f,(U,,n[)f,(U2M\{En'^\^'''\En 



< 



n'>U2 



|/i(n;,ni)/.(n'2,n2)| 



by Lemma 14.2.121 Then using by Corollary 14.2.61 we find 

TV'^inii! |n2|! 



< 



(iV)|n,|(iV)|n2| 
2- (2A:)! 



(4.2.32) 



N 



In the last step, we have assumed that 4A;^ < N, so that {N)£ > N^/2 for any £ < 2k. 
We have also made use of the fact that (still assuming Ak"^ < A^) Eqn. 14.2.321 is 
maximised when |ni| + |n2| = 2A; + 1, and in particular, when one of |ni|, |n2| is 
equal to 2k and the other is equal to 1. 

A similar analysis applies to the pairs 111,112 with |ni| + |n2| = 2k, but with 
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(Hi, Ha) {{P{it),P{it)) : it G Sk}. In this case, 



^ — , / -Err' -E'n' , , 

>v ,/_' ^^i' ' ii2l,,m. TTM,,m„ ttM/ F_. I -r8)A;,A:| 



n'j^>ni,n^>n2 
(n;,n^)#(ni,n2) 



(4.2.33) 

We now use Lemmas 14.2.131 and 14.2.1^ to bound each of the two terms. For the first 
term, we use Eqn. 14.2.261 to upper bound it with 2k/ N. For each choice of H'^ and 
n'g in the second sum, we have |n'^| + lllgl < 2k — 1. Thus we can upper bound the 
absolute value of the second term in Eqn. 14.2.331 with 



(n'j,n^)7^(ni,n2) 

(2fc)! 



N 



< 



N 



We combine the two terms and square to estabhsh Eqn. 14.2. 31dl 

We now put together the components from Eqn. l4.2.31l to upper bound Eqn. 14. 2. 30^ 
and find that 

,> ,9 ,> n9 4:- C^kl)^ 
k\ + \\<k\ + f]l ' , 

implying that Xa < 2l32k{2k\) / \/N < 2(2A;)^*=/\/]V. This concludes the proof of 
Lemma IMSl □ 



4.3 Conclusions 

We have shown how efficient quantum tensor product expanders can be constructed 
from efficient classical tensor product expanders. This immediately yields an efficient 
construction of unitary fc-designs for any k. Unfortunately our results do not work for 
all dimensions; we require the dimension to be Q{{2k)^'^). While tighter analysis of 
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our construction could likely improve this, our construction does not work for N < 2k. 
Constructions of expanders for all dimensions remains an open problem. 
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Chapter 5 



Applications of Designs 

In this chapter we first survey known apphcations of designs from a wide variety 
of areas. Then we present new results applying designs to derandomise some large 
deviation bounds. 

5.1 Review of Applications 

As we have already discussed, random unitaries and random states have many appli- 
cations. For some of these applications a design is sufficient since only the first few 
moments of the distribution are required to be equal to those of the Haar measure. 

5.1.1 Quantum Cryptography 

The first applications we discuss are to quantum cryptography. In classical cryptog- 
raphy, the one-time pad is the most basic operation that perfectly encrypts a message 
using a key that is the same length as the message. In quantum cryptography the 
analogue is a quantum operation £ such that for all input states p, <f (/>) = po with the 
requirement that given a secret key Bob can decode Alice's message perfectly. Since 
all states are encoded to po Eve cannot learn anything about the message without 
knowing the key. In this section all logs will be taken to base 2. 

If po is the identity, then the map S \& & unitary 1-design. Therefore using 2n 
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bits of key to label the Pauli operators provides a quantum one-time pad. In fact, in 
[AMTdOOl it is shown that 2n bits of key are also necessary for a quantum one-time 
pad. Therefore this unitary 1-design is of optimal size. 

It is interesting to note that 2n bits of key are required rather than just n for the 
classical one-time pad. This is related to the fact that quantum states allow super- 
dense coding [BW92j . which allows two classical bits to be sent per qubit. Although 
it is not possible to use a shorter key for exact encryption, it would be desirable to 
shorten the key if we can tolerate Eve learning a small amount of information about 
the message. In |AS04j . they consider closeness in the 1-norm, and set po = I- They 
define a map £ as an e-approximate quantum encryption scheme if for all p 

\\£{p)-I/d\U<e. (5.1.1) 

Thus we see an e-approximate unitary 1-design according to (for example) Definition 
12.2.101 (DIAMOND) suffices. In |AS04j . they present an efficient construction that 
satisfies Eqn. 15.1.11 with n + 2 log n + 2 log (^) +0(1) bits of key. While this does not 
immediately provide an e-approximate 1-design according to any of our definitions 
with only (1 + o(l))n bits of key, Eqn. 15.1.11 is a valid definition of an e-approximate 
unitary 1-design. This key length was further improved by Dickinson and Nayak 
[DN06j to n + 2 log (i) + 0(1) and their construction is efficient. 

A stronger definition for e-approximate encryption was given in jHLSW04] . They 
define a map £ to be an e-approximate quantum encryption scheme if for all p 

\\£{p)-I/d\\oo<e/d. (5.1.2) 

This implies the 1-norm bound in Eqn. 15.1.11 but a dimension factor is lost when 
converting the other way. This could be used as yet another approximate 1-design 
definition. However, there are no known efficient constructions of such cxD-norm ran- 
domising maps. In |HLSW04] they provide an inefficient randomised construction 
with key length n + logn + 2 log (^) + 0(1). Their method is to show that with 
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non-zero probability random unitaries suffice. 

This result was improved by Aubrun in [Aub09] to reduce the key length to n + 
2 log (i) +0(1). The method is the same as |HLSW04] except the analysis is tighter. 
Aubrun also makes a step towards finding an efficient construction by showing that 
the unitaries can be Pauli matrices, which can be implemented efficiently, although 
the sampling is still inefficient. 

Besides cryptographic applications, it is shown in |HLSW04] that oo-norm ran- 
domising maps can be used to hide correlations from local operations and classical 
communication (LOCC) and have applications to data hiding (see later) and locking 
of classical correlations jDHL"'"04] . whereby classical correlations can be hidden but 
unlocked by a very short key. 

The last cryptography example we give is that of non-malleable encryption given 
in [ABWOO] . Here the authors not only consider hiding information from Eve but 
they also require that she cannot change the message. Of course, Eve could always 
replace the message with some fixed state or do nothing, so according to [AB W09] . 
an encryption scheme is non-malleable if these (or a convex combination) are the only 
operations Eve can perform on the encoded data. The main result of this paper is 
that a unitary 2-design is necessary and sufficient. They then show, as do Gross et 
al. |GAE07j . that a 2-design requires at least {d? — 1)^ + 1 unitaries i.e. the key must 
be at least 4n — o(l) bits long. Even for approximate encryption (which can be seen 
as an approximate 2-design) the key length is essentially the same. 

5.1.2 Measurement 

In some cases a random measurement is a good choice but cannot be performed 
efficiently. One example of such a result is: 

Theorem 5.1.1 (Sen, |Sen05j ). Let pi and p2 be any mixed states with r(pi)+r(p2) < 
y/d/C for a sufficiently large constant C . Here, r{p) is the rank of the state p. Then 

Em||M(pi) - M{p2)\\i = n{\\pi - p2\\2) (5.1.3) 
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where M is an orthonormal basis picked from the Haar measure. Here, M{p) is the 
probability distribution of outcomes according to the POVM M . 

Since a large 1-norm distance between probability distributions means the dis- 
tributions are easily distinguishable, this result places a lower bound on the distin- 
guishability of the states pi and p2 in terms of their 2-norm distance. 

In |AE07j . Ambainis and Emerson show that a POVM made from a state 4-design 
achieves the bound in Eqn. 15.1.31 In fact, an e-approximate state 4-design suffices, 
provided that e = 0(||/3i — P2||2)- To ensure the POVM is suitably normalised, we 
insist here that the approximate 4-design is also an exact 1-design rather than an 
e-approximate 1-design, which is all that Definition 12.2.91 ensures. 

In |IR06j ■ Iblisdir and Roland consider a slightly different measurement problem 
for which a random measurement achieves the best outcome. The setting is that Alice 
chooses a random pure state (the authors only consider the case that Alice's system 
is 2-dimensional i.e. a single qubit) from the Haar measure and makes k copies of it. 
Bob then has to find a state with high overlap with the given state. The POVM that 
achieves the optimum is |MP95j 

(fc + l)|^)(^|®^#. (5.1.4) 

From Lemma 12.2.21 the average of this is the projector onto the symmetric subspace 
of k qubits. While this is not the identity, no other outcomes are possible because 
the input state is symmetric. The states in the POVM can be replaced by a state 
fe-design and in |IR06| the authors present a construction of a state /c-design for all 
k, although only for one qubit. 

5.1.3 Average Gate Fidelity 

When implementing a quantum operation, we would like to know how far the actual 
operation is from the desired. One way of measuring this is the average gate fidelity 
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|Nien2j : 

F{£,U) = J dij{ij\U^£{\ij){ij\)U\ij) (5.1.5) 

where £ is the operation implemented and U is the desired unitary. We see imme- 
diately, following |DCEL06j . that the integrand is a balanced polynomial of degree 2 
so the Haar measure on states can be replaced by a state 2-design. We can even use 
an approximate design if the average fidelity only needs to be known approximately. 
By repeatedly sampling from the design we can obtain an estimate of the average to 
1/ poly (log d) accuracy efficiently whereas naively sampling random states will not be 
efficient. 

5.1.4 Data Hiding 

Data hiding was introduced by Terhal, DiVincenzo and Leung |TDL01|, IDLT02j as a 
fundamentally quantum concept. The setting is that Alice and Bob share a quantum 
state which contains secret bits. However, the state is chosen so that if they can 
only communicate using LOCC then they cannot learn this secret bit. To encode one 
secret bit, the "hider" constructs one of two orthogonal mixed states Pq™^ and 
and hands half to Alice and the other half to Bob. pg"^ is the state with m random 
Bell pairs chosen subject to the constraint that the number of singlets is even, p^™"^ 
is the same state except with an odd number of singlets. The parameter m controls 
the degree of security. 

The way that designs help here is in the construction of these states using minimal 
resources. The authors show that pQ can be obtained from twirling any initial pure 
state of the form \ip){ip\ (SD 

(™) — / rITTfTT TT\\J,\/JA \^l,\/^l,\(TT TT\'f 



p\;"> = du{u^u)\'iij){'iij\(^\i;){i;\{u <»uy. (5.1.6) 

p^™^ can be created from /Jq™" . 

The authors consider replacing the Haar integral with a sum over a unitary 2- 



^By unitary invariance of the Haar measure, the choice of \tlj) does not affect the resultant state. 
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design. If errors can be tolerated then an approximate 2-design can be used and the 
state can be prepared efficiently. 



5.1.5 Decoupling and Evolution of Black Holes 

For various tasks in quantum Shannon theory, it is desirable to decouple a system from 
the environment. In [HHYWOT] and [ADHW06] . it is shown that for most random 
unitaries applied to the system the resulting overall state is close to a product state. 

The setting is that there is a system S with two parts 5*1 and 5*2 . The environment 
is E. Let the initial state be ipsE and let 



<^S2EiU) = tis^ \{U Ie)iPse{U^ Ie) 



(5.1.7) 



Then we have 

Theorem 5.1.2 f |ADHWn6j . Theorem 4.2). 



/ 

JU{S) 



Ws^EiU) - as,{U)®aR{U)\\ldU < 



dsdE 

4. 



tr 



{^^seY 



+ tr 



tr 



(5.1.8) 



where asziU) = t^SiE (^sEiU), etc.. 



The proof uses the 2- norm squared, which is a polynomial of degree 2 in the matrix 
elements of the random unitary. Therefore the same result holds when U is selected 
from a unitary 2-design instead and, as above, an approximate design can be used to 
allow an efficient implementation. This allows the encoding circuits in jADHWOB] to 
be made efficient although unfortunately the decoding circuits are still inefficient. 

Decoupling has also been used in the study of the evolution of black holes. While 
many aspects of quantum gravity are not understood, some attempts have been made 
to understand how black holes leak information. Two examples are by Hayden and 
Preskill |HP07j and Sekino and Susskind |SS08j . We concentrate on the approach in 
|HP07j here. The idea is that Alice wishes to destroy some quantum information by 
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throwing it into a black hole. However, Bob has been watching it and storing the 
Hawking radiation emitted. The question they ask is how long does Bob have to wait 
before he can recover Alice's information. 

Imagine that Alice's information is maximally entangled with a system N held 
by Charlie. Should Bob acquire a state from the emitted radiation that is maximally 
entangled with N then we say he has successfully recovered Alice's information. De- 
coupling is used because, if what remains of the black hole after some evaporation is 
uncorrelated with A^, then the emitted radiation must be maximally entangled with 
N and Bob has succeeded. We therefore require that the evolution of the black hole 
produces a decoupling unitary. If the evolution is random then, using Theorem 15. 1.21 
this will likely happen, provided enough radiation has been emitted. In fact, if Bob 
holds a system that is maximally entangled with the black hole's internal state before 
Alice throws in her message, then he can recover her state with fidelity 1 — 2~'^ by 
reading in only the k + c qubits emitted after Alice deposits her information, where 
k is the number of qubits in Alice's message. 

This model is not physically realistic because most unitaries cannot be imple- 
mented efficiently so the black hole would take far too long to apply the decoupling 
unitary. However, as we said above, only a 2-design is required. In fact, in |HP07| 
they consider the case that the evolution of a black hole is a local random quantum 
circuit. This is similar to the random circuits discussed in Chapter [3] except they 
assume that the unitaries are only applied to nearest-neighbour qubits. Should the 
random circuit converge to a 2-design quick enough (as Hayden and Preskill conjec- 
ture) then the evolution will be sufficiently fast for Bob to find Alice's state. While 
our results do not prove this they could readily be extended to cover the local case 
considered here. 

5.1.6 Applications for Larger k 

So far we have only used /c-designs for /c < 4. However, the higher k is the more 
similar a /c-design is to a random unitary. In the next section we consider replacing 
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random unitaries with /c-designs in large deviation bounds, thus finding apphcations 
for larger k. 

5.2 Derandomising Large Deviation Bounds 

The remainder of this chapter has been published previously as jLow09aj . 

There are many results in quantum information theory that show generic proper- 
ties of states or unitaries (e.g. jHLWOGl IHLSW04] ) . Often, these results say that, with 
high probability, a random state or unitary has some property, for example high en- 
tropy. However, as we have seen above, neither random unitaries nor random states 
can be implemented efficiently. This limits the usefulness of such results since no 
physical systems will behave truly randomly. To make such results more physically 
relevant, it would be desirable to show that these properties are generic properties of 
unitaries from some natural distribution that can be implemented efficiently. Only 
then could we conclude that we would expect to see such properties in natural systems. 

In many cases, the generic properties of unitaries are desirable but randomised con- 
structions given by the large deviation bounds are inefficient. We would like to come 
up with distributions which can be implemented efficiently that have similar generic 
properties. One example where the best known construction is an inefficient ran- 
domised one is the oo-norm randomising map (see Section I5.1.ip . Another example is 
locking of classical correlations [DHL"'"04l IHLSW04] , which is a quantum phenomenon 
whereby a small amount of communication can greatly enhance the classical correla- 
tion between two parties. To prove the randomised constructions, the authors show 
that, with some non-zero probability, random unitaries have the required property. 
However, there are no known efficient constructions of unitaries with these proper- 
ties. If, on the other hand, we could show that unitaries drawn randomly from a set 
that can be implemented efficiently have the property with non-zero probability, we 
could move an important step closer to finding efficient constructions. (It would not 
actually provide an efficient construction unless we could find an efficient sampling 
method.) In fact, for the case of oo-norm randomisation, this was done by Aubrun in 
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|Aubn9j . 

In this section we continue the theme of replacing the Haar measure with a k- 
design. The reason for using /c-designs is two-fold. Firstly, because the first k mo- 
ments are the same we would expect similar (although weaker) measure concentration 
results. Secondly, for k = poly(n) (when the design is on n qubits), we might expect 
to be able to implement the fc-design efficiently (i.e. in poly(n) time). Indeed, for 
k = 0{n/ logn), we can use the construction from Chapter (U provided we allow for 
approximate designs. However, in the applications we consider here we can always 
make the approximation good enough to make the error negligible. 

Not only can /c-designs be constructed efficiently, they may even be the product 
of generic dynamics. In Chapter [3l we show that random quantum circuits quickly 
converge to a 2-design for a quite general model of such circuits. We also conjecture 
in Chapter [3] that random circuits give /c-designs for k > 2 and k = poly(n) in 
polynomial time. If a physical system can be accurately modelled by a random circuit 
then, assuming this conjecture, the naturally occurring states will be A;-designs rather 
than fully random states. 

We now summarise some related results in this area. Smith and Leung [SL06] 
and Dahlsten and Plenio |DP06| found large deviation bounds for stabiliser states. 
They showed that, in certain regimes, stabiliser states are very likely to have large 
entanglement. Stabiliser states are state 2-designs so our results can be seen as a 
generalisation of this to /c-designs for k > 2 and to other problems. There are also 
some recent classical results related to the present work. Alon and Nussboim |AN08j 
consider replacing full randomness with /c-wise independence, a classical analogue of k- 
designs, in random graph theory. They show that A;-wise independent random graphs 
with k = poly(log A^) (A^ is the number of vertices) have similar generic properties to 
fully random graphs. 

In the remainder of this chapter, unless otherwise stated, we will use the defini- 
tion of an e- approximate unitary design given in terms of monomials, as in Definition 
12.2.131 Using the tensor product expander construction of Chapter U] together with 
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Lemma 12.2.141 gives an efficient construction for k = O (log (i/ log log for this defi- 
nition. 



5.2.1 Introductory Problem: Entanglement of a 2-design 

We now illustrate our main idea by showing a large deviation bound for the entan- 
glement of a 2-design, but in a different way to |SL061 iDPOGj . 

It has been known for a long time that random states are highly entangled across 
any bipartition |Pag93 , rFK941 [San95j . Further, in |HLW06j . it is shown that random 



unitaries generate almost maximally entangled states with high probability. However, 
generating random states is inefficient so it is an interesting question to ask if random 
efficiently obtainable states are highly entangled. 

Let the system he Ti = Tis 'S'T-Le, where we label the two systems S and E. Let 
the dimensions be ds and cIe and d = dsdE- Let the overall initial state be any fixed 
pure state po- Then consider applying a random unitary U to SE to get the state 
ijj = UpoW^. Then the von Neumann entropy S{'ips) = — tr -05 log -05 of the reduced 
state ■i/'5 = tr£; ■0 is close to log2 ds (the maximal) with high probability: 

Theorem 5.2.1 ( |HLW06j Theorem 3.3). Let ds > ds > 3. Then for unitaries 
chosen from the Haar measure 

FiSii^s) < log2 ds-a-l3)< exp (JAz^)^) (5.2.1) 

where C=^, and ^ = 

Now, consider choosing the unitary from a 2-design instead. Later on (Lemma 
I5.4.ip . we show that Etr^/)| = '^^^^ ='■ fJ-- Since purity is a polynomial of degree 2, 
it does not matter if we take the expectation over the Haar measure or the 2-design. 
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We now apply Markov's inequality: 

P (trV'l > /U7) < 



1 

7' 



Using the bound S{ips) > — log2 tr and some manipulations (the details are in 
Section [5.4p . this can be written as 

¥{S{i^s) < log2 ds-a-p)< 2-" (5.2.2) 

where f3 is as in Theorem 15.2.11 This bound is much weaker than the bound in 
Theorem 15 . 2 . 1 1 and . in particular, does not show stronger concentration as d increases. 
Later in the chapter, we will show that choosing unitaries from a fc-design with larger 
k will give a much stronger bound that does give sharp concentration results for large 
d. 

5.2.2 Main Results 

We will now state our main results. 
Our most general result is: 

Theorem 5.2.2. Let f be a polynomial of degree K. Let f{U) = Y^^aiMi{U) where 
Mi{U) are monomials and let a(/) = |aj|. Suppose that f has probability concen- 
tration 

^u^u(d)i\f-l^\>S)<Ce-'''" (5.2.3) 
and let v be an e-approximate unitary k-design. Then 

^u^uilf -l^\>^)<s^{c[^y + ^{a + (5.2.4) 

for integer m with 2mK < k. 

We therefore take a bound for Haar random unitaries of the form Eqn. 15.2.31 and 
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turn it into a bound for fc-designs. Often, we will use Levy's Lemma (Lemma 15. 3. 2p 
to give the initial concentration bound in Eqn. I5.2T31 In this case, a = 0(d) (provided 
the Lipschitz constant (see later) is constant). 

We then apply this to entropy, as a generalisation of Section 15.2.11 We go via the 
2-norm since the entropy function is not a polynomial. We find 

2 

Theorem 5.2.3. Let u he a 4r^ -approximate unitary loiog^ „ -design on dimension 
2'^ with n > 19. Let dsds = 2" and 2 < ds < 2"/^° and a > 2. Then 

P(/~.(S(V5) < log2 ds-a-(3)< 8exp2 ("^^j^ + ^^'^'^^ 

where f3 = ^ and exp2 is the exponential function base 2. 

We choose a /c-design for k = j^q^"^ ^ since this is (up to constants) the largest k 
for which we have an efficient unitary fe-design construction (using the construction 
of Chapter HI). 

We then move on to apply our results to ideas in statistical mechanics from 
Popescu et al. [PSWOGj . In this paper, the authors show that, for almost all pure 
states of the universe, any subsystem is very close to the canonical state, which is the 
state obtained by assuming a uniform distribution over all allowed states of the uni- 
verse (defined in Eqn. I5.5.2p . This could be achieved if the dynamics of the universe 
produced a random unitary, but this would take exponential time in the size of the 
universe. We show that the random unitary can be replaced by a /c-design, showing 
that the canonical state can be reached in polynomial time: 

Theorem 5.2.4. Let ^Is be the canonical state of the system (defined in Eqn. 15. 5.S\} 

and ps he the state after choosing a unitary from an e- approximate k- design. Let dn 

he the dimension of the universe 's Hilhert space suhject to the arbitrary constraint R 

„ /4(^3 \ k/8 

(normally this will he a total energy constraint). Then for e < f ( "3^") ' ^ — 9^ 
ru^.{\\ps-ns\\i>5)<6i^-^j . (5.2.6) 
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Finally, we use results from [GFE09j to show that most states in an 0(l)-approxi- 
mate state n-^-design on n qubits are useless for measurement-based quantum comput- 
ing, in the sense that any computation using such states could be simulated efficiently 
on a classical computer. We do this, following [GFEOQj . by showing that the states 
are so entangled that the measurement outcomes are essentially random. 

5.2.3 Optimality of Results 

An important question is how close our results are to optimal, in terms of their 
scaling with dimension d. In Theorem [5221 we will normally have a = Q{d) so for m 
constant, we obtain polynomial bounds, rather than the exponential bounds for full 
randomness. This is to be expected: 

Theorem 5.2.5. Let v he an e- approximate unitary k-design. Suppose also that it 
is discrete i.e. contains a finite number S of unitaries. Let f{U) be any function on 
matrix elements of U and fj, be any constant. Then either f{U) = fj, for all U in u or 
for some 5 > 

Pmin 

(5.2.7) 

where Pmin is the probability of choosing the least probable unitary from u. If the 
probability is uniform, Pmin = 1/5'. 

Proof. There exists at least one U such that \f{U) — ^| > 5 for some (5 > 0; the 
probability of selecting one such U is at least Pmin- D 

Corollary 5.2.6. Our results are polynomially related to the optimal (i.e. the optimal 
hounds can be obtained by raising ours to a constant power). 

Proof. Our results apply for any design, so must obey the bound in Theorem 15.2.51 
for all designs. The unitary design construction we use (from Chapter H] using Lemma 
12.2. 14|1 has 

Pmin — d ^^^^ hence the bounds cannot scale better than this. D 
We can also almost recover the tail bound for full randomness in Theorem 15.2.21 
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Suppose for simplicity that we have an exact design (i.e. e = 0), so that 

(777 \ m 
^) • 

The optimal m is a5'^/e, which gives 

Pc/~.(|/-/i|>5)<Ce-"^'/^ 

So our result allows us to interpolate from Markov's inequality, which gives weak 
bounds, all the way to full Haar randomness and is within a polynomial correction of 
optimal for the full range. 

The remainder of the chapter is organised as follows. In Section 15.31 we present 
our main technique for finding large deviation bounds for A;-designs. We then apply 
this to entropy in Section 15. 4| to ideas in statistical mechanics in Section 15.51 and to 
using /c-designs for measurement-based quantum computing in Section [5.61 We then 
conclude in Section 15.71 

5.3 Main Technique 

The main idea in this chapter can be summarised in three steps. Let / : U{d) — )• C 
be a balanced polynomial of degree K in the matrix elements of a unitary U. Then 
to get a concentration bound on / when U is chosen from a fc-design: 

1. Find some measure concentration result for \f{U) — when the unitaries are 
chosen uniformly at random from the Haar measure. Normally n will be the 
expectation of /. 

2. Use this to bound the moments E|/(f7) — /ip"^ for some integer m < 

3. Then use Markov's inequality and the fact that for a (approximate) fc-design the 
moments are (almost) the same as for uniform randomness. We then optimise 
the bound for m, which will often involve setting m close to the maximum. 



137 



We will now work through each of these steps and finish with a proof of Theorem 
[5X21 



5.3.1 Step 1: Concentration for uniform randomness 

For the first step, we will often start with Levy's Lemma. This states, roughly speak- 
ing, that slowly varying functions in high dimensions are approximately constant. We 
quantify 'slowly varying' by the Lipschitz constant: 

Definition 5.3.1. The Lipschitz constant rj (with respect to the Euclidean norm) for 
a function f is 

\fiUi)-fiU2)\ 
V = sup — — . (5.3.1) 

Ui,U2 W'-'l - f^2||2 

Then we have Levy's lemma: 

Lemma 5.3.2 (Levy, see e.g. [LedOl] ). Let f be an ij- Lipschitz function on U{d) with 
mean E/. Then 

P(|/ - E/l >6)< 4exp (-^^) (5.3.2) 
where C^ can he taken to he tt^. 

5.3.2 Step 2: A bound on the moments 

Levy's Lemma says that / is close to its mean. This means that E|/ — E/l"* should 
be small. We will bound the moments for slightly more general concentration results: 

Lemma 5.3.3. Let X be any random variable with probability concentration 

F{\X - n\ > 6) < Ce-''^\ (5.3.3) 

(Normally fi will be the expectation of X , although the bound does not assume this.) 
Then 

E|X - < C7r(m/2 + l)a-'"/2 <C[ — ] (5.3.4) 



2a 
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for any m > 0. 



Proof. This proof is based on the proof of an analogous result by Bellare and Rompel 
[BR94) . Lemma A.l. 

Note that, for any random variable Y > 0, 



poo 

EY = F{Y > y)dy. (5.3.5) 

^0 



Therefore 



E|x-/xr= / p(|x - > 
Jo 

POO 

= / F{\X-fi\ >x^/'^)dx 
Jo 

fOO 

<C exp(-ax2/"^)(ix 
Jo 



where in the last line we used the assumed large deviation bound Eqn. 15.3.31 To 
evaluate this integral, use the change of variables y = ax^l"^ to get 



(~l fOO 

E\X - fir < — a-™/2 / e-yy'^/^-^dy 
2 Jo 

= Ca-™/2r(m/2 + l) 
- \2aJ 



5.3.3 Step 3: A concentration bound for a /c-design 



Now we show how to obtain a measure concentration result for polynomials when the 
unitaries are selected from an approximate A;-design. We first show that the moments 
of 1/ — /i| for / a polynomial are close to the Haar measure moments: 



Lemma 5.3.4. Let f he a balanced polynomial of degree K and fi be any constant. 
Let f = oiiMi where each Mi is a monomial. Let a{f) = - \ai\. Then for m 
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an integer with 2mK < k and v an e- approximate k-design, 



\2m 



(5.3.6) 



Proof. For simplicity, we assume that / and ^ are real. Our proof easily generalises 
to the complex case. 

Firstly we calculate |E;7^i,/* — ]E;7~W(d)/*l using the multinomial theorem: 



E 

ki+...+kt=i 



< 



E 



ki+...+kt=i 

e 



ki,...,kt 



ki,...,kt 



a 



[Eu^, - Eu^uid)) M^' . . . * 



<4 V , 

~ d'' ^ \ki, ... ,kt 

ki+...+kt=i 



|«i|'^...|at|^* 



a . 



We now calculate E^^j^l/ — // 



2m. 



\Eu^M-^^\^"'-^u^u(d)\f-^i\ 



2m I 



E[/~.(/-^)'"-Ef/^W(d)(/-/i) 

2m 



2m I 



1=0 
2m 



< 



( 7 ) (iEc/~.r - Eu^uid)h{-f^?'^-' 

2m 



E 

1=0 

jkYl 



ill , . |2m— i 



< 4 vr^^"] 

i=0 ^ 



□ 



Now we can simply apply Markov's inequality to prove Theorem 15.2.21 
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Proof of Theorem \5.2.B . Apply Markov's inequality and Lemmas 15.3.31 and I5.3.4t 



— g2m 

We finish this section with two remarks. Firstly, provided a{f) (the sum of the 
absolute value of all the coefficients) is at most polynomially large in d, we can choose 
e to be polynomially small to cancel this at no change to the asymptotic efficiency. 
Secondly, when applying the theorem we will optimise the choice of m (and normally 
choose k = 2mK). Often a = Q{d) and the optimal choice of m is often @{d) as 
well. However, we will not take m so large because we can only implement an efficient 
/c-design for k = O (log d/ log log d). 



5.4 Application 1: Entropy of a A;-design 

We now apply the above to show that most unitaries in a fc-design generate large 
amounts of entropy across any bipartition, provided the dimensions are sufficiently 
far apart. This means that, for any initial state, for most choices of a unitary from a k- 
design applied to the state, the resulting state will be highly entangled. We go via the 
purity of the reduced density matrix, since the entropy function is not a polynomial. 

We will call the two systems S (the 'system') and E (the 'environment') and 
calculate the purity of the reduced state. That the purity, tr (tr^; f/pC/"!")^ , is a 
balanced polynomial of degree 2 is easily seen by noting that the trace is linear and 
the reduced state is squared. However, we should check that there are not too many 
terms or terms with large coefficients. To do this, we should calculate a to apply 
Theorem 15.2.21 

There is a general method for calculating a(/) which we will use. Write f{U) = 
'Y^^aiMi{U) for monomials Mj. To evaluate a{f) = \oti\, calculate f{A) where A 
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is the matrix with all entries equal to 1 (so that Mi{A) = 1) and replace with \ai 
Using this here we find 



< 



d'\\p\\l 



We now calculate the expected purity: 

Lemma 5.4.1. The expected purity of the reduced state is "^^^i^ , where ds is the 
dimension of subsystem S and dE = d/ds is the dimension of subsystem E. 

Proof. We have 



tr J-s,s,(tri5 UpU^ ® iiE UpU^) 



(5.4.1) 



where J^SiS2 is swap acting between systems and 5*2. By linearity of the trace, we 
can commute the ^U'^U(d) through and use E^^^(ji) [U pW (>5 U pW] = ^^^^^^if to find 



tr 



d{d+l) 

^ {dlds + dEdl) 



d{d+l) 
ds + dE 
d+l 



□ 



Working out the higher moments in this way is difficult (although has been done 
in [GirOTj ) so we use Levy's Lemma and Lemma 15.3.31 To use Levy's Lemma, all we 
have to do is find the Lipschitz constant for the purity: 

Lemma 5.4.2. The Lipschitz constant for purity is < 2. 
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Proof. 



rj = sup ■ 



V>,0 \\''P-(P\\2 

•^S\\2 - ||<^5||2| (ll^slb + \\<Ps\\2) 



IIV'-0I|2 

Now we use || 15112 — ||r||2| < ||5 — T||2 to find 

ry<sup(||V'5||2 + ||</'s||2) <2 

using the fact that the purity is upper bounded by 1. □ 

Lemma 5.4.3. For fi = '^'^^'l^ and m an integer with m < k/A and v an e- 

approximate k-design, 

1 / . / 4m \ ™ e 



(5.4.2) 

Proof. We use the fact that von Neumann entropy is lower bounded by the Renyi 
2-entropy i.e. - loga WtPsWI- 

Si^l^s) > S2{i^s) = -log2||Vs|li. (5.4.3) 

Then 

^U^uiSii^s) < - log2(l + < Pf/~.(52(V'S) < - log2(l + 

= Fc/^.(||^5||i > + 

<iPc/~.(|IIV'sI|2-m| >Sf^) 

1 /. /4m\'" e , ,4 , ^,2m 



using Theorem 15.2.21 in the last line. □ 
We have written this in a more convenient form in Theorem 15.2.31 which is proved 
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in Section 15.81 This is to be compared with the Haar random version Theorem 15.2.11 
As expected, we have n = log2 d appearing in the exponent rather than d. Note also 
that our bound does not work weU for ds ~ ds- In fact, in this case, we do not get a 
bound that improves with dimension. In order to achieve such a bound in this regime 
a different technique wiU be necessary. 

5.5 Application 2: A;-designs and Statistical Mechanics 

We can also apply these ideas to partially derandomise some of the arguments on the 
foundations of statistical mechanics in [PSW06j . In this paper, the authors develop 
the idea that the uncertainty in statistical mechanics comes from entanglement rather 
than the traditional assumption of the principle of equal a priori probabilities. They 
consider the universe being in a pure quantum state and that the uncertainty in the 
state of a subsystem comes from the entanglement between this system and the rest 
of the universe. 

The setting is that there is an arbitrary global linear constraint R. Often this will 
be a total energy constraint although this is not assumed. Let the Hilbert space of 
states satisfying R be %r. Then let the system and environment Hilbert spaces be 
Us and He respectively. Then 



Let the dimensions be da, ds and ds and let Er = Note that d^ < dsdE, unlike 
in the above where we took d = dsdE- Normally we will have ds <^ dR. The principle 
of equal a priori probabilities says that the state of the universe is Sr which implies 
the subsystem state is the canonical state, given by 



The main result of |PSW06j (the 'principle of apparently equal a priori probabilities') 



'Hr<^'Hs®^e- 
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is that, for almost all pure states of the universe, the subsystem state is almost exactly 
the canonical state. 

Theorem 5.5.1 (Theorem 1 of [PSWOGj l. For a randomly chosen state \(t)) ^T-LbS^ 
T-Ls ®%E oind arbitrary e > 0, the distance between the reduced density matrix of the 
system ps = tiE{\<P){4>\) cLn-d the canonical state $7s fEqn. [5'.5.2\} is given probabilisti- 
cally by 

Ps-^s\\i>e + J^] <2exp{-C2dRe^) (5.5.3) 



where C2 = l/(187r3) and df = ^>^. 



This result gives compelling evidence to replace the principle of equal a priori 
probabilities with the principle of apparently equal a priori probabilities, but it does 
not address the problem of how the system reaches this state. It will take an extremely 
(exponentially) long time for the universe to reach a random pure state, in contrast to 
the observed fact that thermalisation occurs quickly. Here, we show that for almost 
all unitaries in a A:-design applied to the universe, the subsystem state is close to the 
canonical state. Since these unitaries can be implemented and sampled efficiently, 
this means that equilibrium could be reached quickly to match observations. 

We are now ready to show that a fc-design gives a small \\ps — ^s\\i- First, we 
have to modify Lemma 15.3.31 slightly: 

Lemma 5.5.2. Let X be any non-negative random variable with probability concen- 
tration 

P(X >5 + r])< Ce-"^^ (5.5.4) 



where rj > 0. Then 



for any m > 0. 



EX™<C7(^^j +(2r/)"^ (5.5.5) 



The proof is very similar to the proof of Lemma 15.3.31 
Now we state and prove the main result in this section: 
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Theorem 5.5.3. Let v he an e- approximate unitary k- design. Then 

(5.5.6) 

In particular, with e = § ( "d^ ) ' ^ — 8(72(^5, 



Pc/~.(||/05-^^5||i>5)<6f^J . (5.5.7) 

Again, we need ^5 to be polynomially smaller than dn to obtain non-trivial 
bounds. 

Proof. We go via the 2-norm and use Lemmas 15.5.21 and 15.3.41 
We have from Theorem 15.5.11 that 



Ur^U{d){\\PS 



^sWi >S + r])< 2e-^2<ifl52 ^^^^ 



where t? = < Since \\ps - ^sh <\\ps- ^s\\i, 

^u^uid){\\ps -^s\\2>S + 7])< 2e-^2rf«<5^ (5,5,9) 
We now apply Lemma 15.5.21 to get 

^U^U(d)\\ps-mf^<'^(^-^j +(2ry)'"^. (5.5.10) 

So for m < /c/4, using Markov's inequality and Lemma 15.3.41 (with = 0) on the 
polynomial \ \ps — ^s\\2 ■ 

Pu^Mps - m2 >6)<^(2 (^)™ + (2r/)'"^ + ^(4 + I)''") • (5.5.11) 
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Here, we used an estimate of a, the sum of the moduh of the coefficients: 

a<(4 + l)^ (5.5.12) 



which we obtain via a similar calculation to that in Section [57 

Now we go back to the 1-norm, using — < Vds\\ps — ^slU to get 

Pc/~.(||ps - ^sWi >S)< Fu^uiWps - ^s\\2 > (5.5.13) 

(5.5.14) 

To obtain the result in Eqn. I5.5.6| we just use r/ < and set m = k/8. 

To prove the simplified version, first use, as in Section [5.41 that {dj^ + l)'^^ < 
for m < d\/8. This is implied by A; < 8C2(i|. We then set m = k/8 to find 

kds /44\'-" , e 



] , we obtain the simplified result Eqn. l5.5r7l 

□ 



5.6 Application 3: Using /c-designs for Measurement- 
Based Quantum Computing 

Here we apply our ideas to partially derandomise some results of Gross, Flammia and 
Eisert in |GFE09| and Bremner, Mora and Winter in |BMW09j . The main result in 
these two papers is that most states do not offer any advantage over classical compu- 
tation when used in the measurement-based quantum computing (MBQC) model. In 
MBQC, a classical computer is given access to a large quantum state on which it can 
do single qubit measurements. Some states allow for universal quantum computation 
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whereas others do not add any extra power to the classical computer. These results 
are concerned with the question of characterising which states do and do not work. 
Showing that random states do not give any speed up shows that useful states for 
MBQC are not generic and so must be carefully constructed. 

While the results in these two papers are similar, we will concentrate on the 
methods from [ GFE09j since their methods are simpler to apply here. They prove 
their result by showing that most states are very entangled in the geometric measure 
(see Definition 15.6. ip . They then use this to show that the measurement outcomes of 
even the best possible measurement scheme are almost completely random. In fact, 
the state could be thrown away and the measurement outcomes replaced with random 
numbers to solve the computational problem just as efficiently. This shows that you 
can classically simulate any quantum computation that uses these highly entangled 
states. The measure of entanglement they use is the geometric measure: 

Definition 5.6.1. The geometric measure of entanglement of a state |^') is JShi95[ 

Eg{\^)) = -log^snp\{am^. (5.6.1) 

where V is the set of all product states. 

They show that any MBQC using a state |^) with Eg(\^)) = n — 0(log2?i-) can 
be efficiently simulated classically. They then show that 

Theorem 5.6.2 ([GFE09], Theorem 2). For n > 11, 

%>^5m(^9(I^» <^-21og2n-3) <e-"'. (5.6.2) 

This shows that most states are useless. We partially derandomise this result to 
show that most states in an e-approximate (e can be taken as a constant) state n'^- 
design have high geometric measure of entanglement and thus are useless in the same 
way. 

We could apply our technique and use Theorem 15 . 2 . 2 1 but in this case, it is simpler 
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to directly bound the probability using Markov's inequality. 
Lemma 5.6.3. 

777 I / 777 \ TTl 

^mumn' ^ ^) ^ (1 + ^) ^ (1 + ^) ids) ^'-'-^^ 

where 1^*) is chosen from an e- approximate state k-design v, m < k and a positive 
integer and |$) is any fixed state. 

Proof. We prove this bound directly using Markov's inequality: 



- 

^ I d-l ) 

1 + e 



(1 + e)m! /- X / w-x"* ^ 

We now prove the main result in this section: 

Theorem 5.6.4. For \^) randomly drawn from an e-approximate state k-design with 
ci = 2" 

¥\^)^^{Eg{\^)) < n-6) < (l+e)exp2(/clog2 2A;+4nlog2l0n-fc5+4n(n-(5)). (5.6.4) 
In particular, for k = , 5 = 2> log2 ra + 5 and e = 1, 

W\^^^^{Eg{\^)) <n-31og2n-5) <2-n-"'. (5.6.5) 
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We note that this bound is almost the same as in Theorem [5221 It only works for 
slightly larger deviations from n, which is why we obtain a slightly better probability 
bound. Note also that we can obtain an exponential bound in n (not d = 2") because 
the design is exponentially large in n. 

Proof. This proof closely mirrors the proof of Theorem 2 in |GFE09j . We use the idea 
of a 7" net. Af-y^n is a 7-net on product states if 

sup inf Ilia) -|d)||2< 7/2. (5.6.6) 

In j GFEOQj . it is shown that such a net exists with lAA-y^nl ^ (572/7)'*"'. We then 
proceed by showing that most states in the state design have small overlap with every 
state in the net using the union bound and Lemma l5.6.3l Finally, since every state is 
close to one in the net, we can show that most states in the design have small overlap 
with every product state. 

We now formalise the above. Using Lemma 15.6.31 and the union bound, 



"1^)^^ sup \{a 

\|a)GA/'-y,„ 



> S'/2^ < |AA,,.|(l + e) < + 



2k ^ ^ 



2-^5' 
(5.6.7) 



Now, we need to bound 

P|^)^,(S<,(|^)) <n-(5) =P|*)^J-log2 sup \{a\^)\'<n-5 



'\^)r.u \ sup Ka|f)P > 2 



-(n-<5) 



We now claim that 



sup |(a|^)P > 5' ^ sup \{a\^)\'^ > 6' /2. (5.6.8) 

To prove this claim, let |a) be the state that achieves the supremum on the left hand 
side, and let \a) be the state closest to it in the (5'/2-net. It is shown in [GFE09| that 
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this implies for any |^') 

||(a|^')|2 - |(d|*)p| < b' 12. (5.6.9) 

Therefore 

Ka|v&)|2>|(a|M')|2-572 
> 572. 

This implies that the supremum over all states in the net must be at least 5'/2 to 
prove the claim. 

We can now finish the proof. Set b' = 2~("~^) in Eqn. 15.6.81 and use Eqn. 15.6.71 
with 7 = (5'/2 to find 



V>~J sup |(a|^')|2>2- 



2 ^ 9-{n-5)-l 



<P|^)^, I sup Ka|^)|^>2 

< (1 + e) exp2(A;log2 2k + 4nlog2 lOn - A:5 + 4n(n - S)). □ 



Combining this with the arguments of [GFE09j shows that most states in a state 
n^-design on n qubits are useless for MBQC. This shows that even many efficiently 
preparable states are useless. 



5.7 Conclusions 

We have seen how to turn large deviation bounds for Haar-random unitaries into 
bounds for /c-designs. The main technique was applied to show that unitaries from 
/c-designs generate large amounts of entanglement. Then we showed that, if the dy- 
namics of the universe produced a /c-design, the entanglement generated would be 
sufficient to reproduce the principle of equal a priori probabilities. Finally we showed 
that most states in sufficiently large state designs are useless for measurement-based 
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quantum computing, in the sense that computation using them can be efficiently 
simulated classically. 

However, there are other bounds for which our technique does not work. Since 
we cannot obtain exponential bounds for polynomially sized designs, our technique 
cannot directly derandomise some bounds. Some results, for example showing that 
the oo-norm of the reduced state of a random pure state is close to 1 /ds j HHL04j , are 
proven by using an e-net of states and the union bound. Since the e-net is exponen- 
tially large, exponentially small bounds are required. We do not know how to apply 
our idea to results of this kind and still have k = poly (log d). (Note that we could 
cope with the e-net in Section 15.61 since it was just a net on product states which is 
considerably smaller.) 

It is also possible that our ideas could be used to completely derandomise some 
constructions (e.g. locking (HLSWn4[ lDHL+n4j ). If we could show that unitaries 
drawn from a /c-design work with non-zero probability, and come up with an efficient 
sampling method, then we could obtain efficient randomised constructions. 



Here we prove the more convenient form of Lemma 15.4.31 stated as Theorem 15.2.31 

Proof of Theorem \5.2.3l Firstly, we will write the left hand side of Eqn. 15.4.21 in a 
more useful way. Using ln(l + x) < x, we find 

- log2 A* > log2 ds - P 
where /3 = j^^, following the notation in [HLW06j . This means 



We now simplify the right hand side. Let 6 = 2" — !. For ds > 2, we have fj, > l/ds- 



5.8 Proof of Theorem 5.2.3 



Fu^u{S{^Ps) < log2 ds-a-P)< Fu^,{S{^Js) < - loga - a) 
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We shall also assume that m = k/8. This gives us (using /x < 1) 



P.~.(S(fe)<log,d.-a-/J)<(f) 4(^) + 



k/4 / / I. \k/8 / -I ^ 

(5.8.1) 



Now, one can easily show (e.g. by induction on n) that 

(l + (5)"<2 (5.8.2) 

for 2nd < 1. We use this for n = and 6 = The condition is then k < 2d^, 

which we shall assume (we will set A; = log d/ log log d later). We now obtain 



n^u{S{il^s) < log2 ds-a-P)< \^-fj ^ J + 2eJ . (5.8.3) 

We will now take e = 2 (2073) > so that the two terms are the same, logl/e is 
poly(log d) so this remains efficient. Now 

Assuming that (5^ > , we should take k as large as possible up to ^^^^'^ , when the 
right hand side is maximised. We then find the result after further simplification. □ 
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Part II 

Quantum Learning 
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Chapter 6 

Learning and Testing Algorithms 
for the Clifford Group 

6.1 Introduction 

A central problem in quantum computing is to determine an unknown quantum state 
from measurements of multiple copies of the state. This process is known as quantum 
state tomography (see |NCOO| and references therein). By making enough measure- 
ments, the probability distributions of the outcomes can be estimated from which 
the state can be inferred. A related problem is that of quantum process tomography, 
where an unknown quantum evolution is determined by applying it to certain known 
input states. There are several methods for doing this, including what are known as 
Standard Quantum Process Tomography |CN971 IPCZ97j and Ancilla Assisted Pro- 
cess Tomography [DLPOH ILeu03j . These methods work by using state tomography 
on the output states for certain input states. 

However, all these procedures share one important downside: the number of mea- 
surements required increases exponentially with the number of qubits. This already 
presents problems even with systems achievable with today's technology, for which 
complete tomographical measurements can take hours (e.g. [HHR"'"05] ) making tomog- 
raphy of larger systems unfeasible. Unfortunately this exponential cost is necessary to 
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determine a completely unknown state or process, since there are exponentially many 
parameters to measure. To make tomography feasible for larger systems, we need to 
find a restriction that requires fewer measurements, ideally polynomially many. 

One way to improve the measurement, or query, complexity is to assume some prior 
knowledge of the process. For example, suppose the process was known to be one of a 
small number of unitaries, then the task is just to decide which. This is the approach 
we take here. As a simple example, consider being given a black box implementing 
an unknown Pauli matrix. By applying this to half a maximally entangled state, the 
Pauli can be identified with one query. This is essentially superdense coding |BW92j 
and is explained in Section FG.S.ll Indeed, if the black box performed a tensor product 
of arbitrary Paulis on n qubits then it too can be identified with just one query. 

We extend this to work for elements of the Clifford group (the normaliser of the 
Pauli group; see Definition 16.2. ip and show that any member of the Clifford group 
can be learnt with 0{n) queries, which we show is optimal. The Clifford group is 
an important subgroup of the unitary group that has found uses in quantum error 
correction and fault tolerance |CRSS97[ ISho96l IGot98j . 

Then generalising further, we show that elements of the Gottesman-Chuang hi- 
erarchy jGC99j (see Definition I6.2.2p . also known as the hierarchy, can also be 
learnt efficiently. As the level k increases, the set includes more and more unitaries 
so this implies ever larger sets can be learnt, although the number of queries scales 
exponentially with k. Our methods also work if the unitary is known to be close to a 
Clifford (or any element of Ck for some known k) rather than exactly a Clifford. 

We also give a Clifford testing algorithm, which determines whether an unknown 
unitary is close to a Clifford or far from every Clifford. This is an extension of the Pauli 
testing algorithm given in |MO08j . Indeed, our results are closely related to results in 
[MO08j and we use some of the algorithms presented there as ingredients. Our results 
can also be compared with jAarOTj . which contains methods to approximately learn 
quantum states. Another related result is that of Aaronson and Gottesman |AG09j . 
which provides a method of learning stabiliser states with linearly many copies. 
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We only consider query complexity although, at least for the Clifford group results, 
our methods are computationally efficient too. 

The rest of the chapter is organised as follows. In Section [6.21 we define the Pauli 
and Clifford groups and the Gottesman-Chuang hierarchy. In Section [6.31 we present 
our algorithm for exact learning of Clifford and Ck elements. In Section 16.41 we show 
how to find the closest element of to an unknown unitary. In Section [631 we present 
our Clifford testing algorithm and then conclude in Section 16.61 

This chapter has been published previously as |Low09b| . 

6.2 The Pauli and Clifford Groups and the Gottesman- 
Chuang Hierarchy 

Firstly, we define the Pauli group. Call the set of all Pauli matrices on n qubits V. 
We then have \V\ =4". We write matrices in the Pauli basis using the normalisation 
p = J2pl(p)^p- make P into a group, the Pauli group V, we must include each 
matrix in V with phases {±1, ibi}. 

We can now define the Clifford group: 

Definition 6.2.1 (The Clifford group). The Clifford group is the normaliser of the 
Pauli group i.e. 

C = {[/ eZ^(2") : UVU^ C V}. 

Then the Gottesman-Chuang hierarchy is a generalisation: 

Definition 6.2.2 (The Gottesman-Chuang hierarchy |GC99j ). Let Ci he the Pauli 
group V . Then level Ck of the hierarchy is defined recursively: 

Ck = {U€ U{r) : UVU^ C Cu-i}. 

By definition, C2 is the Clifford group C. For k > 2, is no longer a group but 
contains a universal gate set, whereas Ci and C2 are not universal. 
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6.3 Learning Gottesman-Chuang Operations 



Before we give our algorithm for learning Gottesman-Chuang operations, we present 
a simple method for learning Pauli operations, which we use as the main ingredient. 

6.3.1 Learning Pauli Operations 

This is due to |MO08j and is in fact identical to the superdense-coding protocol 
|BW92j . 

Theorem 6.3.1 ( |MQ08j . Proposition 20). Pauli operations can be identified with 
one query and in time 0{n). 

Proof. Apply the operator dp to half of the maximally entangled state 

|V'> = 2-"/^5^|n>. 

i 

For different choices of dp, the resulting states are orthogonal so can be perfectly 
distinguished: 

{(Jp ® I) {(Jq ® I) = 2^" ^{ii\apaq I\jj) 
= 2-"^(^|apa,|j)(i|j) 
= 2""^(i|cjpCJg|i) 

i 

= 2~" tr apaq 

The time complexity 0{n) comes from the preparation and measurement operations. 

□ 
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6.3.2 LeEirning Clifford Operations 

We can now present our algorithm for learning Clifford operations to illustrate our 
main idea for learning unitaries in the Gottesman-Chuang hierarchy. We will use the 
fact that knowing how a unitary acts by conjugation on all elements of V identifies it 
uniquely (up to phase): 

Lemma 6.3.2. Knowing UapU^ for all ap E V uniquely determines U, up to global 
phase. 

Proof. The Pauli matrices form a basis for all 2" x 2" matrices so knowing the action 

of U on the Paulis is enough to determine the action of U on any matrix up to phase. 
The phase cannot be determined because action by conjugation does not reveal the 
phase. □ 

Now let G = c^iliLi where (o'zi) is the matrix with ax (ctz) acting on 
qubit i and trivially elsewhere. We think of this as a set of generators for V since 
each element of V can be written as a product of elements of G, up to phase. Using 
this, knowledge of how U acts on elements of G is sufficient to determine the action 
on all of ■p: 

Lemma 6.3.3. UupU'^ for any ap G V can be calculated from knowledge of UagW 
for each Ug & G. 

Proof. Let Gp = aag^ . . . ag^ for ag. G G where a is a phase. Then 

UapU^ = aUag, ...ag^U^ = aUag, uK . . Uag^ . □ 

With these definitions and observations, we can now present the Clifford learning 
algorithm. 

Theorem 6.3.4. Given oracle access to an unknown Clifford operation C and its 
conjugate C^ , C can be determined exactly (up to global phase) with 2n+ 1 queries to 
C and 2n to C^ . The algorithm runs in time O(n^). 
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Proof. From the definition of the Chfford group, Co-pC^ G V for all Up G V. Note 
that CfTpC^ is not necessarily a Pauli operator in V because there is a phase of ±1 
(complex phases are not allowed because CapC^ is Hermitian). Determining which 
Pauli operator and phase for every ap would be sufficient to learn C using Lemma 
16.3.21 But from Lemma 16.3.31 we only need to know CagC"^ for each ag G G. 

Let Ccj^.^C"!' = OjO-a, and C(T^^C+ = ftfif,^, where ai,f3i = ±1. Knowing just (Ta. 
and (Tfe. is enough to specify C up to a Pauli correction factor ag which gives the 
phases Oi and /3j. Choosing ag that anticommutes with a^^ flips the sign of and 
similarly for fi^. . We now present the algorithm: 

1. Apply Cax^C^ and Cc^.C^^ for each i and use Theorem 16.3.11 to determine (Ta. 
and (Tfe. . This uses 2n queries to both C and 

2. Let C be such that CV^jC"^ = and CV^.C"^ = cxfe. i.e. the phases are all 
+1. Then, choosing a phase for C", we can write C = C'a where 

^= n n (6.3.1) 

i:aj=— 1 i:/3i=— 1 

Then implement C'^C to determine a using Theorem 16. 3. II This uses one query 
to C. We can now calculate the phases Oi and 

To work out the time complexity, note that in step 1 the 0{n) time Pauli learning 
algorithm is called 2n times. Then for step 2, the Clifford C can be implemented in 
O(n^) time using for example Theorem 10.6 of [NCOO] . □ 

We now show that this algorithm is optimal, in terms of number of queries, up to 
constant factors: 

Lemma 6.3.5. Any method of learning a Clifford gate requires at least n queries. 

Proof. Each application of the gate C can give at most 2n bits of mutual information 
about C. This follows from the optimality of superdense coding |BW92j . The Clifford 
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group (modulo global phase) is of size |(^RSS98j 2'^'+2"+3 nj=i(4^' - 1) > 22"' +"+3. 
To identify an element with m queries, we therefore need 

22nm > 22"'+"+3 (6.3.2) 

which implies m > n. □ 

It is unfortunate that access to is also required, but we do not know a method 
with optimal query complexity that works without . There are however methods 
that use 0{n'^) queries that do not use C^. The result of |HW06j can be used to 
show that 0{n'^) queries to C are sufficient, by distinguishing the states C (>S> Iltp) for 
different Cliffords C and where {ip) is the maximally entangled state. We can use 
Lemma 16.4.41 to show that these states are far apart in the distance measure used in 
[HW06j ■ allowing us to apply their result. 

6.3.3 Learning Gottesman-Chuang Operations 

Theorem 16.3.41 can easily be generalised to learning any operation from the Ck hier- 
archy: 

Theorem 6.3.6. Given oracle access to an unknown operation C G and its con- 
jugate C\ C can he determined exactly (up to phase) with ^'^2n-i^ queries to C and 
{2nf-'^ to Ct. 

Proof. The proof is by induction. The base case is for the Paulis and is proven in 
Theorem 16.3. II Then, to learn C G C^+i, we assume we have a learning algorithm for 
members of C^. Apply Ca^C^ for each ag G G. These operations are elements of Ck 
so use the learning algorithm for Ck to determine these up to phase. Then use the 
last step of Theorem 16.3.41 to determine the phases. 

We now determine the number of queries to G and C'^ . Let T{k) be the number 
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of queries to C and T'{k) the number of queries to . We have the recurrences 



T{k + 1) = 2nT{k) + 1 



r(i) = 1 



(6.3.3) 



and 



T'{k + l) = 2nT'{k) 
T'{2) = 2n 



(6.3.4) 



which have solutions T{k) 



(2n)'°-l 
2n-l 



and T'{k) = (2n)'=-i (with T'(l) = 0). 



□ 



6.4 Learning Unitaries Close to Ck Elements 

Here we suppose that we are given a unitary that is known to be close to an element 
of Ck for some given k. We present a method for finding this element. But first we 
must define our distance measure. 

We would like our distance measure to not distinguish between unitaries that 

differ by just an unobservablc global phase. Wc define a 'distance' D below with this 
property. However, firstly define the distance to be a normalised 2-norm distance: 

Definition 6.4.1. For U\ and U2 d x d matrices, 



where \\A\\2 = Vtr A^A. 

We have chosen the normalisation so that < D^(Ui,U2) < 1. We now define 
our phase invariant 'distance': 

Definition 6.4.2. For U\ and U2 d x d matrices, 



D+{UuU2) : 



1 



Ul-U2\\2. 



V2d 




1 
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This is not a true distance since D{Ui,U2) = does not imply Ui = U2, but that 
Ui and U2 are the same up to a phase so the difference is unobservable. From the 
2-norm definition, we can show: 

Lemma 6.4.3. 



D^iU,,U2) = \ll-Re^^^ (6.4.1) 



and 



D{Ui,U2 



1 



iiUiUl 



d 



(6.4.2) 



Prom this we can easily see that < D(Ui,U2) < 1 with equality if and only if 
Ui and U2 are orthogonal. Further note that by the unitary invariance of the 2-norm, 
both D and are unitarily invariant and from the triangle inequality for the 2-norm 
they both obey the triangle inequality. 

Our approximate learning method will find the unique closest element of Ck to U. 
In order to guarantee uniqueness, the distance must be upper bounded: 

Lemma 6.4.4. // D{U, C) < 2k~i/2 f'^'"' some C ^Ck then C is unique up to phase. 

The proof is in Section 16.71 

Theorem 6.4.5. Given oracle access to U and C/^ and k such that D{U,C) < e for 
some C & Ck with 



e' := ^2(1 - (2^-ie)2) - 1 > (6.4.3) 
then C can he determined with probability at least 1 — 6 with 

o(±(2„,-i„.e!i±iti 

queries. 

Proof. By Lemma I6.4.4^ C is unique up to phase. We now prove the Theorem by 
induction. 
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For k = 1, use Proposition 21 of |MO08j to learn the closest Pauli operator. This 
works by repeating the Pauli learning method Theorem 16.3.11 and taking the majority 
vote. This uses O log |) queries to succeed with probability at least 1 — 6. 

Now for the inductive step. Assume we have a learning algorithm for level k. 
Then for C G Ck+i, let Cag-C^ = Cg- for ag. G G. By Lemma 16.8.11 we have 
D{Uag. W ,CgJ < 2e. Use the learning algorithm for level k to determine Cg^ up to 
phase for all i. Then to find the phases we use the same method as before: implement 
any C with C'ag^C'^ = ^Cg. for any (known) choice of phase. Then C = Cdg for 
some Pauli operator cjq. We can determine aq by implementing C'^U and using the 
k = 1 learning algorithm since 

D{C'^U,aq) = D{U,C'aq) 

= D{U,C)<e. (6.4.4) 

Now we calculate the success probabilities and number of queries. There are 2n + 1 
calls to the algorithm at lower levels, which all succeed with probability at least 1 — 6. 
So at this level the success probability is at least 1 — (2n + 1)6. So to succeed with 
probability at least 1 — 6 we must replace 6 with 6/{2n + 1). Then the overall number 
of queries is 

2„ . O ( J,(2„)'-' log P!l±i£) +1 = ( jL(2„)' lo. eiifl!) . (6.4.5) 

□ 

We remark that there is only 0(A;logn) overhead (for constant e' and 6) over the 
exact learning algorithm of Theorem 16.3.61 

6.5 Clifford Testing 

Here we present an efficient algorithm to determine whether an unknown unitary 
operation is close to a Clifford or far from every Clifford. Whereas the previous 
results allow us to find the Clifford operator close to the given black box unitary, 
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in this section we are concerned with determining how far the given unitary is from 
any Chfford. We do not measure this directly, but provide an algorithm of low query 
complexity that decides if the given unitary is close to a Clifford or far from all. 
This type of algorithm is known in computer science as a property testing algorithm 
and has many applications, including the theory of probabilistically checkable proofs 



ALM"'"98 . The result in this section could be extended to work for any level of the 
Gottesman-Chuang hierarchy although for simplicity we only present the version for 
Cliffords. 

The key ingredient to our method will be a way of estimating the Pauli coefficients: 

Lemma 6.5.1 (Lemma 23 of [MU08j ). For any p £ {I,x,y,z}^ and unitary U, the 
Pauli coefficients |7(p)| = ^ \trUap\ can be estimated to within ±7/ with probability 
1 — 6 using O ( ^ log 4 ) queries. 



This is a generalisation of Theorem 16.3.11 and the method is similar. Instead of 
there being only one possible outcome, now the probability of obtaining the outcome 
corresponding to ap is estimated. This probability is equal to |7(p)p. 

Theorem 6.5.2. Given oracle access to U and W with the promise that for < e < 1 
either 

a) CLOSE: there exists C G C such that D{U,C) < or 



b) FAR: for all C £C, D(U, C) > e and there exists C £C such that D{U, C) < 1/3 



.3 



holds then there is a O log ^ j algorithm that determines which with probability at 
least 1 — 5. 

Proof. In both cases, we have that D{U,C) < 1/3 for some C, which ensures that C 
is unique (using Lemma 16.4.41 since | < and can be found using Theorem 16.4.51 
with O (nlog^) queries. Then the algorithm is: 

1. For each ag G G, measure the Pauli coefficient of GagG^ in Uagll^ (i.e. measure 
|tr UagU^GagG^/2'') to precision using Lemma 16.5.11 
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2. If all the coefficients are found to have modulus at least 1 — then output 
CLOSE else output FAR. 

This works because, for the two possibilities CLOSE and FAR: 



a) Using Lemma 16.8.11 D{U, C) < implies that for all dp G V, 

D{UapU\CapC^)< 



32n 



(6.5.1) 



Since we will only apply UagW for cr^ G G we restrict this to only the generators 
to find that for all ag G G, 



D{U(TgU\CagC^) < 



2e 



giving 



tr UOgU^CCJgC^ 



2n 



> 1 



32n 



8n2 



(6.5.2) 



(6.5.3) 



for every generator ag. We need a bound on the non-squared coefficients, which 
follows directly: 



tr UagU^CagC^ 



> 1 



(6.5.4) 



Therefore when measuring the coefficients to precision , all results will give 
at least 1 — j^p-- 



b) Using the contrapositive of Lemma 16.8.21 D(U, C) > e implies that there exists 

D+{UapU^ , CapC^) > e. (6.5.5) 



ap E V such that 



Using the contrapositive of Lemma [6. 8. 31 this in turn implies there exists ag £ G 
such that 

D+{UagU\CagC^)> (6.5.6) 



which means that for at least one C7„ G G, UaqU'^ will have a small overlap with 
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CfjgC^ i.e. there exists ag ^ G such that 



tr UagU^CagC^ 



2" 



e2 



< 1 - ^. (6.5.7) 



The C returned by the apphcation of Theorem l6.4.5l is such that tr UagWCagC^ 
is positive, which justifies inserting the absolute value signs above when using 
rather than D. This implies that at least one coefficient will be found to be 
less than 1 — when measuring to precision □ 

6.6 Conclusions and Further Work 

We have shown how to exactly identify an unknown Clifford operator in 0{n) queries, 
which we show is optimal. This is then extended to cover elements of the Ck hierarchy 
and for unitaries that are only known to be close to Ck operations. The key to 
the Clifford learning algorithm is to apply CapC^ and then find the resulting Pauli 
operator. 

A way of extending this idea could be to learn unitaries from larger sets. Suppose 
V is a set of unitaries with the property that for every V ^ V, VcTpV'^ is a linear 
combination of a constant number of Paulis. Then V can be learnt in the same 
way as above, using the quantum Goldreich-Levin algorithm of |MO08j . which can 
efficiently find which Paulis have large overlap with an input unitary. However, we 
have not been able to find interesting sets V other than the Clifford group with this 
property. 

We also presented a Clifford testing algorithm, which determines whether a given 
black-box unitary is close to a Clifford or far from every Clifford. This can be seen 
as a quantum generalisation of quadratic testing, just as Pauli testing can be seen 
as a quantum generalisation of linearity testing. Property testing of this form is 
used to prove the PCP theorem [ALM"'"98] so these quantum testing results could 
potentially be useful in proving a quantum PCP theorem. It would also be interesting 
to strengthen the testing method in Theorem 16.5.21 to remove the 0(l/n) difference 
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between the close and far conditions. 

Finally, it would be interesting to see if it is possible to remove the requirement 
to have access to . However, using both U and W is the key to our method so we 
do not know if a method without C/^ is possible with low query complexity. 

6.7 Proof of Lemma 6.4.4 

Proof of Lemma \6.4-4\ The proof is by induction. The base case is for k = 1 when 
we have the Pauli group. Without loss of generality, assume C is a Pauli operator 
with no phase. Let C = Up. 

Expand U in the Pauli basis: 

f^ = E^(^K- (6.7.1) 
Since U is unitary, we have ItI^)!^ = 1- By Lemma [6.4.31 

which implies 



(6.7.2) 



Hp)?>l-e\ (6.7.3) 



Now, suppose for contradiction that there exists ap-^ ^ ap^ with D{U, (jpj < e and 
Then by the 

constraint |7(pi)P + 17(^2)1^ ^ 1 which combined give 



£>([/, (Tpj) < e. Then by the above, |7(pi)P, |7(p2)P > 1 — e^- But there is also the 



. > -L 

which is false by assumption. This implies ap^ = , which proves the base case. 

To prove the inductive step, again assume for contradiction that there exist 
Ci,C2 G Cfc+i with Ci / C2 and D{U,Ci) < e and D{U,C2) < e. Then there 
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exists ag £ G with 



CiagCl =: Cig / C2g := C2<ygCl (6.7.5) 



Here, Cig,C2g G Cfc. 

Using LemmaESH D{U(TgU\Cig) < 2e and D{UagU\C2g) < 2e. 

Now there are two cases. Firstly, suppose we can choose Ug such that CiagC\ 7^ 
±C2(ygC\. Then Cig and are not equivalent up to phase so, using the inductive 
hypothesis, we must have 

2^ ^ ^ (6-7.6) 

or 

^ ^ 2(fc+iVi/2 (6.7.7) 

which is again false by assumption. 

For the other case, CiagC\ = ±C2crgC2 for all ag € G. This implies that G2 = 
CiUq for some Pauli Uq / /. Then we have 



D{U,Ci)<t 

D{U,Ciag)<e (6.7.8) 

which by unitary invar iance gives 

D{G\UJ) < e 

D{Clu,aq)<e. (6.7.9) 

But we proved that this is impossible in this range of e in the k = 1 proof above. □ 



6.8 Miscellaneous Lemmas 

Here we prove some miscellaneous lemmas used earlier in the chapter. 

The first lemma says that for two close operators Ui and U21 UiapU\ is close to 
U20'pU2 for all Paulis dp: 
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Lemma 6.8.1. If D{Ui,U2) < 6 then for all G V , 



D{UiapUl,U2apUl) < 2S. 

Proof. Let Ui = VU2 and U2p = U2(TpU2- Then we simply apply the triangle inequal- 
ity for D and unitary invariance: 

D{UiapUl U2apUl) = D{VU2pV\ U2p) 
= D{VU2p, U2pV) 
< D{VU2p, U2p) + D{U2p, U2pV) 
= D{V,I) + D{I,V) 

= 2D{Ui,U2). □ 

The next lemma is a converse to this: 
Lemma 6.8.2. If for all (Tp^V 



D^{UiapUl,U2(7pUl) < 5 



(6.8.1) 



then 



D{Ui,U2)<S. 



(6.8.2) 



Proof If D+{UiapUl,U2apUl) < S then ^Reti UiapUlU2crpUl >1-S^. Since this 
is true for all ap, we can take the average of this over the whole of V and use the fact 
that for any dx d matrix A ^ o'pAap = -^ti A (the Paulis are a 1-design) to 

find 

^Retr C/i (^L^rUlu^ ul>l- 6^ (6.8.3) 

which simplified gives 



tvUiUl 



giving the desired result. 



(6.8.4) 
□ 
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Now we show how to go from distances for just the generators G to distances for 
the whole of V: 

Lemma 6.8.3. If for all ag e G 

D+{UiagUl U2(7gUl) < S (6.8.5) 

then for all ap Ep 

D+{UiapUl, U2(JpUl) < 2nS (6.8.6) 

Proof. The proof is by induction on the number of generators required to make ap, 
using the triangle inequality for □ 
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